Artem
Shamsuarov
15+ years building production-grade computer vision systems. Specialized in real-time 3D reconstruction, multi-camera pipelines, SLAM, and GPU-accelerated processing. Published researcher with 7+ patents.
Work Experience
- Developed production-grade C++ algorithms for industrial hardware products
- Implemented real-time image processing pipelines for multi-sensor systems
- Achieved 10–100x performance improvements through GPU (CUDA) optimization
- Designed automated quality control and inspection systems
- Maintained CI/CD pipelines (Jenkins) and mentored junior engineers
- Built real-time algorithms for embedded hardware with onboard processing
- Implemented geometry processing and spatial computing algorithms for production systems
- Achieved 70% latency reduction through CUDA/OpenCL/Vulkan optimization
- Designed software architecture for distributed computing clusters
- Built multi-platform apps (iOS/Swift/ObjC, Windows/C++) for industrial products
- Developed CV algorithms and apps deployed to millions of mobile users
- Built complete 3D reconstruction pipeline: SfM, SLAM, depth estimation, depth fusion
- Implemented near-duplicate image retrieval and face detection systems
- Published at IEEE CVPR Workshop; secured 3 patents
- Developed optimization algorithms for computational lithography at next-gen semiconductor nodes
- Created C++/Qt/Python tools for Optical Proximity Correction and Resolution Enhancement
- Collaborated with Cadence, Mentor Graphics, Synopsys; published at SPIE; obtained 2 US patents
Technical Skills
Personal Explorations
Independent explorations built from scratch on personal time and hardware, unrelated to any employer's products or proprietary technology. Spanning ML inference, developer tools, quantitative finance, audio, and mobile apps.
From-scratch LLM inference engine in C++17/CUDA. Custom tiled matmul, fused attention, RoPE, and SwiGLU kernels. GGUF model loading with Q4/Q5/Q8 quantized inference. Runs Llama 3.2 1B on a 6 GB GPU.
GPU-accelerated local semantic code index for Claude Code. CUDA-powered ONNX embeddings, HNSW vector search, tree-sitter AST chunking across 9 languages, and an MCP server for sub-3 ms context retrieval — fully on-device.
Low-latency limit order book and matching engine with lock-free SPSC ring buffers, zero-allocation hot path, slab allocator, and market microstructure analytics (spread, microprice, order flow imbalance, Kyle's Lambda).
GPU-accelerated portfolio optimization with Monte Carlo scenario generation (cuRAND + Cholesky), Mean-CVaR via custom ADMM solver, PCA factor model (15.6x speedup), and rolling-window backtesting with transaction costs.
Desktop guitar practice app — load any Guitar Pro tab (GP3–GP8), plug in via USB, and get real-time pitch detection with hit/miss feedback and scrolling tab playback. A lightweight, free Yousician alternative.
Desktop app for recording vocals over music tracks. Automatic vocal separation via Demucs, cross-correlation alignment of recording to backing track, LUFS normalization (ITU-R BS.1770-4), and one-click export of the final mix.
Zero-tap document scanner for Android. Auto-detects document boundaries, captures on stability, corrects perspective with sub-pixel corner refinement. Classical image processing pipeline with multi-strategy preprocessing — no ML, no cloud.
Windows UWP app for triggering audio and visual effects during live theater performances. Two-window setup — operator control panel with color-coded effect list + theater display output to a projector. Dual media players with smooth 3-second crossfade transitions, YAML-based show definitions. Built for and used in actual live productions.
Publications & Patents
Academic Background
Let's Connect
Passionate about computer vision, real-time systems, and GPU-accelerated computing. Based in Luxembourg.