loss=0.001 · status=converged

I reduce loss functions
for a living.

Production ML systems, low-latency inference, and the quiet satisfaction of watching p99 drop. Currently deployed at scale.

2+Years in ML
50M+Daily Predictions
42%Latency Reduction
99.97%Uptime (and counting)
Learn More

About

Model Card: bing-tan-v2.6

Architecture
Human (coffee-powered transformer variant)
Training Data
BSc & MSc Business Analytics @ VU Amsterdam (Computational Intelligence track), 2+ years industry ML
Intended Use
Production ML systems, inference optimization, scale
Fine-tuned On
PyTorch, Kubernetes, late-night debugging sessions
Hyperparameters
curiosity=0.95, persistence=0.92, coffee_intake=high
Known Limitations
Cannot resist optimizing things that are "fine"
License
Open to interesting problems

Most ML engineers come from computer science. I came from business analytics. My MSc at VU Amsterdam had me building deep learning models in the Computational Intelligence track, but my undergrad taught me something most engineers skip: why the model matters to the business. I think that's why I obsess over production — a model that doesn't ship is just a notebook.

I believe the best model is the one that's actually in production. Monitoring matters more than accuracy. A model you can't observe is a liability. And I'd rather ship something simple on Monday than something perfect never.

I get unreasonably excited about shaving milliseconds off inference latency, building systems that fail gracefully instead of silently, and the moment when a training loss curve finally bends.

Primary Capabilities

Inference & Serving

Low-latency GPU inference, dynamic batching, model optimization, autoscaling

Training & Pipelines

Distributed training, feature engineering, data pipelines, experiment tracking

MLOps & Reliability

CI/CD for ML, monitoring, A/B testing, gradual rollouts, incident response

How I Think

The best model is the one that's actually in production.
Monitoring > accuracy. A model you can't observe is a liability.
Ship simple on Monday. Ship perfect never.
If your feature store has drift, your model has drift. You just don't know it yet.
The gap between research and production isn't technical — it's operational.

Tech Stack

ML Frameworks

PyTorch TensorFlow HuggingFace scikit-learn

Infrastructure

Kubernetes Docker AWS GCP Databricks

MLOps & Data

MLflow Airflow Spark dbt

Languages

Python Rust C++ Kotlin SQL

Case Studies

How We Cut Inference Latency by 42% (Without Losing Sleep)

Our inference platform was handling 50M predictions a day on CPU. Latency was "acceptable" — until a traffic spike turned p99 from 200ms to 2 seconds. The on-call page woke me up at 2am.

By morning I had a prototype: dynamic batching with ONNX Runtime on GPU, predictive autoscaling triggered by Kafka traffic signals, and a fallback to cached predictions when the model was cold. Three weeks later we were in production.

p99 dropped 42%. Throughput tripled. And the annual cloud bill got $100k lighter.

42%
p99 Latency Reduction
3.1×
Throughput Increase
$100k
Annual Cost Savings

Fixing Train-Serve Skew Before It Fixed Our OKRs

Feature values were different in training and serving. The model didn't care. Our users did.

Challenge

Feature computation was a bottleneck, with inconsistent values between training and serving causing model drift.

Solution

Built unified feature store with streaming ingestion, point-in-time correctness, and sub-10ms serving latency.

Tech Stack

Apache Kafka, Redis, Spark Streaming, Feast, Python, Go.

<10ms
Feature Serving Latency
15%
Model Accuracy Improvement
100+
Features in Production

Training Progress

loss epochs MSc @ VU Amsterdam First model in prod 42% latency cut 50M daily predictions

Featured Projects

LLM Serving Toolkit

High-performance inference server with continuous batching, KV cache management, and OpenAI-compatible API.

50M+ req/day
PyTorchRustCUDA

ML Pipeline Framework

Declarative ML pipeline orchestration with automatic versioning, caching, and distributed execution.

200+ daily runs
PythonRayK8s

Feature Store

Real-time feature serving with point-in-time correctness and streaming ingestion for ML models.

<10ms p99
GoRedisKafka

Model Monitor

Drift detection, performance monitoring, and alerting system for production ML models.

40+ models tracked
PythonPrometheusGrafana

View all projects on GitHub →

Writing

Technical deep-dives on ML systems, infrastructure, and lessons from production.

ML Playground

Every ML system needs an API. Here are some endpoints I'd never put in production.

Prediction Endpoints

/predict/shorts

Will you wear shorts today? Based on weather + thermal comfort.


        

/predict/coffee-need

Do you need coffee right now?


        

/predict/LLM-sanity

Is your LLM about to hallucinate?


        

/meme-generator

Generate a random ML meme for your stress level.


        

/predict/work-from-home-outfit

Perfect remote work outfit based on your day.


        

/predict/motivation

How motivated will you be today?


        

/predict/pizza-topping

Optimal pizza topping combo based on your mood and time.


        

/predict/pet-reaction

Predict your pet's reaction to your new haircut.


        

/predict/marathon-performance

Predict your marathon finish time (fun version).


        

/predict/perf-mode-chaos

How chaotic will Perf Mode get today?


        

Contact

My inference endpoint is always warm. Reach out about ML systems, interesting problems, or your hot take on batch normalization.