About

I'm Bing — a Machine Learning Engineer specializing in taking models from research to production.

My work sits at the intersection of ML engineering, infrastructure, and systems design. I've built and operated inference platforms handling millions of requests, designed training pipelines for large-scale models, and implemented MLOps practices that keep systems reliable.

I care about the details that matter: reducing p99 latency, building robust monitoring, designing APIs that scale, and writing code that other engineers can maintain.

Inference & Serving

Low-latency GPU inference, dynamic batching, model optimization, autoscaling

Training & Pipelines

Distributed training, feature engineering, data pipelines, experiment tracking

MLOps & Reliability

CI/CD for ML, monitoring, A/B testing, gradual rollouts, incident response

Case Studies

Large-Scale Inference Platform

Challenge

High-latency CPU inference under bursty load with unpredictable traffic patterns and cold start issues.

Solution

Implemented dynamic batching, model optimization, and predictive autoscaling based on traffic signals.

Tech Stack

PyTorch, ONNX Runtime, Kubernetes, Prometheus, custom batching service.

42%

p99 Latency Reduction

3.1×

Throughput Increase

$100k

Annual Cost Savings

Real-Time Feature Platform

Challenge

Feature computation was a bottleneck, with inconsistent values between training and serving causing model drift.

Solution

Built unified feature store with streaming ingestion, point-in-time correctness, and sub-10ms serving latency.

Tech Stack

Apache Kafka, Redis, Spark Streaming, Feast, Python, Go.

<10ms

Feature Serving Latency

15%

Model Accuracy Improvement

100+

Features in Production

Featured Projects

LLM Serving Toolkit

High-performance inference server with continuous batching, KV cache management, and OpenAI-compatible API.

PyTorchRustCUDA

ML Pipeline Framework

Declarative ML pipeline orchestration with automatic versioning, caching, and distributed execution.

PythonRayK8s

Feature Store

Real-time feature serving with point-in-time correctness and streaming ingestion for ML models.

GoRedisKafka

Model Monitor

Drift detection, performance monitoring, and alerting system for production ML models.

PythonPrometheusGrafana

View all projects on GitHub →

Fun Stuff

Because ML engineers need to have fun too. Interactive experiments and silly predictions.

Play Backprop Runner Neural Field Particles

ML Playground Endpoints (click to expand)

/predict/shorts

Will you wear shorts today? Based on weather + thermal comfort.

Temperature (°C) Humidity (%) Wind (m/s) Condition

/predict/coffee-need

Do you need coffee right now?

Heart rate Hours sleep Mood Time of day

/meme-generator

Generate a random ML meme for your stress level.

Stress level (1-10) Framework

/predict/work-from-home-outfit

Perfect remote work outfit based on your day.

Zoom calls today Coffee intake (cups) Weather

/predict/motivation

How motivated will you be today?

Day of week Hours slept Calendar meetings

/predict/pizza-topping

Optimal pizza topping combo based on your mood and time.

Mood Time of day Weather

/predict/LLM-sanity

Is your LLM about to hallucinate?

Prompt length Temperature Token count

/predict/pet-reaction

Predict your pet’s reaction to your new haircut.

Pet type Haircut length Hair color

/predict/marathon-performance

Predict your marathon finish time (fun version).

Distance (km) Playlist type Hydration level Weather

/predict/perf-mode-chaos

How chaotic will Perf Mode get today?

Coffee level (cups) Last commit hour Ambient music

I build ML systems
that work in production

About

Inference & Serving

Training & Pipelines

MLOps & Reliability

Tech Stack

ML Frameworks

Infrastructure

MLOps & Data

Languages

Case Studies

Large-Scale Inference Platform

Challenge

Solution

Tech Stack

Real-Time Feature Platform

Challenge

Solution

Tech Stack

Featured Projects

LLM Serving Toolkit

ML Pipeline Framework

Feature Store

Model Monitor

Writing

Fun Stuff

/predict/shorts

/predict/coffee-need

/meme-generator

/predict/work-from-home-outfit

/predict/motivation

/predict/pizza-topping

/predict/LLM-sanity

/predict/pet-reaction

/predict/marathon-performance

/predict/perf-mode-chaos

Contact

I build ML systemsthat work in production

About

Inference & Serving

Training & Pipelines

MLOps & Reliability

Tech Stack

ML Frameworks

Infrastructure

MLOps & Data

Languages

Case Studies

Large-Scale Inference Platform

Challenge

Solution

Tech Stack

Real-Time Feature Platform

Challenge

Solution

Tech Stack

Featured Projects

LLM Serving Toolkit

ML Pipeline Framework

Feature Store

Model Monitor

Writing

Fun Stuff

/predict/shorts

/predict/coffee-need

/meme-generator

/predict/work-from-home-outfit

/predict/motivation

/predict/pizza-topping

/predict/LLM-sanity

/predict/pet-reaction

/predict/marathon-performance

/predict/perf-mode-chaos

Contact

I build ML systems
that work in production