About
I design, build, and operate production machine learning systems. My background spans data engineering, model training, inference platforms, and reliability at scale.
I design, build, and operate production machine learning systems. My background spans data engineering, model training, inference platforms, and reliability at scale.
Problem: High-latency GPU inference under bursty load.
Solution: Dynamic batching, KV caching, autoscaling.
Impact: ↓ 42% p99 latency, ↑ 3.1× throughput.
Try these state-of-the-art ML endpoints for free!
Will you wear shorts today? Based on weather + thermal comfort.
Do you need coffee right now?
Generate a random ML meme for your stress level.
Perfect remote work outfit based on your day.
How motivated will you be today?
Optimal pizza topping combo based on your mood and time.
Is your LLM about to hallucinate?
Predict your pet’s reaction to your new haircut.
Predict your marathon finish time (fun version).
How chaotic will Perf Mode get today?