About
Model Card: bing-tan-v2.6
- Architecture
- Human (coffee-powered transformer variant)
- Training Data
- BSc & MSc Business Analytics @ VU Amsterdam (Computational Intelligence track), 2+ years industry ML
- Intended Use
- Production ML systems, inference optimization, scale
- Fine-tuned On
- PyTorch, Kubernetes, late-night debugging sessions
- Hyperparameters
curiosity=0.95, persistence=0.92, coffee_intake=high- Known Limitations
- Cannot resist optimizing things that are "fine"
- License
- Open to interesting problems
Most ML engineers come from computer science. I came from business analytics. My MSc at VU Amsterdam had me building deep learning models in the Computational Intelligence track, but my undergrad taught me something most engineers skip: why the model matters to the business. I think that's why I obsess over production — a model that doesn't ship is just a notebook.
I believe the best model is the one that's actually in production. Monitoring matters more than accuracy. A model you can't observe is a liability. And I'd rather ship something simple on Monday than something perfect never.
I get unreasonably excited about shaving milliseconds off inference latency, building systems that fail gracefully instead of silently, and the moment when a training loss curve finally bends.
Primary Capabilities
Inference & Serving
Low-latency GPU inference, dynamic batching, model optimization, autoscaling
Training & Pipelines
Distributed training, feature engineering, data pipelines, experiment tracking
MLOps & Reliability
CI/CD for ML, monitoring, A/B testing, gradual rollouts, incident response
How I Think
The best model is the one that's actually in production.
Monitoring > accuracy. A model you can't observe is a liability.
Ship simple on Monday. Ship perfect never.
If your feature store has drift, your model has drift. You just don't know it yet.
The gap between research and production isn't technical — it's operational.
Tech Stack
ML Frameworks
Infrastructure
MLOps & Data
Languages
Case Studies
How We Cut Inference Latency by 42% (Without Losing Sleep)
Our inference platform was handling 50M predictions a day on CPU. Latency was "acceptable" — until a traffic spike turned p99 from 200ms to 2 seconds. The on-call page woke me up at 2am.
By morning I had a prototype: dynamic batching with ONNX Runtime on GPU, predictive autoscaling triggered by Kafka traffic signals, and a fallback to cached predictions when the model was cold. Three weeks later we were in production.
p99 dropped 42%. Throughput tripled. And the annual cloud bill got $100k lighter.
Fixing Train-Serve Skew Before It Fixed Our OKRs
Feature values were different in training and serving. The model didn't care. Our users did.
Challenge
Feature computation was a bottleneck, with inconsistent values between training and serving causing model drift.
Solution
Built unified feature store with streaming ingestion, point-in-time correctness, and sub-10ms serving latency.
Tech Stack
Apache Kafka, Redis, Spark Streaming, Feast, Python, Go.
Training Progress
Featured Projects
LLM Serving Toolkit
High-performance inference server with continuous batching, KV cache management, and OpenAI-compatible API.
50M+ req/dayML Pipeline Framework
Declarative ML pipeline orchestration with automatic versioning, caching, and distributed execution.
200+ daily runsFeature Store
Real-time feature serving with point-in-time correctness and streaming ingestion for ML models.
<10ms p99Model Monitor
Drift detection, performance monitoring, and alerting system for production ML models.
40+ models trackedWriting
Technical deep-dives on ML systems, infrastructure, and lessons from production.
ML Playground
Every ML system needs an API. Here are some endpoints I'd never put in production.
Backprop Runner
3D endless runner where you dodge gradient collapse
Neural Field Particles
8,000 particles advected through a continuous vector field
Prediction Endpoints
/predict/shorts
Will you wear shorts today? Based on weather + thermal comfort.
/predict/coffee-need
Do you need coffee right now?
/predict/LLM-sanity
Is your LLM about to hallucinate?
/meme-generator
Generate a random ML meme for your stress level.
/predict/work-from-home-outfit
Perfect remote work outfit based on your day.
/predict/motivation
How motivated will you be today?
/predict/pizza-topping
Optimal pizza topping combo based on your mood and time.
/predict/pet-reaction
Predict your pet's reaction to your new haircut.
/predict/marathon-performance
Predict your marathon finish time (fun version).
/predict/perf-mode-chaos
How chaotic will Perf Mode get today?