← baseten / Product Manager - Dedicated Inference

brief / art_boE5zUodFkA

role

baseten / Product Manager - Dedicated Inference

model

anthropic/claude-sonnet-4.6

created

2026-05-29T18:35

Company snapshot

Baseten is an AI inference platform company that enables engineering teams to deploy, serve, and manage ML models in production with a focus on performance, reliability, and developer experience. The company counts Cursor, Notion, Abridge, Clay, Gamma, and Writer among its customers — all frontier AI product companies with demanding inference requirements. Baseten recently closed a $300M Series E (investors include BOND, IVP, Spark Capital, Greylock, and Conviction), signaling aggressive growth and platform expansion. The company's engineering reputation centers on low-latency, high-throughput inference infrastructure and a strong developer-first philosophy. Based on the JD and public signals, Baseten is actively expanding its core product surface — APIs, SDKs, async inference, multi-component workflow chains, and model training pipelines — suggesting a platform maturation phase following the Series E.

Team stack

Based on the JD and public signals: Python-first SDK surface (likely FastAPI or similar for internal services), REST and likely gRPC APIs for model serving endpoints, Kubernetes-based infrastructure for model deployment (likely, given inference-at-scale requirements), React or similar for the dashboard/UI layer, and GPU-accelerated compute (NVIDIA A100/H100 class, likely). The JD references 'Chains for multi-component workflows' and 'Asynchronous inference' as active initiatives, suggesting a task-queue or event-driven backend (possibly Celery, Ray, or a proprietary async runtime). Model training integration implies familiarity with frameworks like PyTorch, HuggingFace Transformers, and possibly vLLM or TensorRT-LLM for optimized serving. Observability tooling (metrics, logging, tracing) is referenced in the JD — stack uncertain but likely Prometheus/Grafana or a third-party APM. All inferences marked as 'likely' or 'based on the JD' where not publicly confirmed.

Likely questions (10)

area	question	why
system_design	Walk us through how you would design the product surface for asynchronous inference on Baseten — what does the API contract look like, how do users poll or receive results, and what observability hooks would you expose?	Async inference is explicitly listed as an active initiative in the JD. This tests whether the candidate can translate ML infrastructure complexity into a clean, developer-friendly API design — a core requirement of the role.
system_design	Baseten is building 'Chains' for multi-component workflows. How would you define the product requirements for a chaining primitive — what abstractions matter to developers, and how do you avoid over-engineering it?	Chains is the first example initiative listed in the JD. The question probes the candidate's ability to scope a complex orchestration feature for developer usability without unnecessary complexity.
domain	How do you think about the tradeoffs between a model-serving SDK that is highly opinionated versus one that is highly flexible? What signals from developer behavior would tell you which direction to push?	The JD emphasizes owning APIs, SDKs, and developer workflows. This question tests domain depth in developer tooling product philosophy, directly relevant to Baseten's core product surface.
domain	Baseten's customers include companies like Cursor and Abridge — teams with very different inference latency and throughput profiles. How would you structure a product roadmap that serves both without fragmenting the platform?	The JD requires aligning product initiatives with company strategy and driving adoption across a diverse customer base. This tests the candidate's ability to segment and prioritize across heterogeneous technical users.
behavioral	Tell me about a time you drove a developer platform initiative from 0 to 1 — how did you define success, what did you learn from early adopters, and what would you do differently?	The JD requires proven ability to launch new products and improve onboarding. The candidate's ICE Self-Service platform at Intuit and Streamio/Fintellect founding experience are directly relevant anchors.
behavioral	Describe a situation where you had to align engineering, design, and GTM teams around a product decision that involved significant technical tradeoffs. How did you build consensus?	Cross-functional alignment is an explicit requirement in the JD. This tests the candidate's ability to operate as a connector across technical and non-technical stakeholders.
coding	You're reviewing a PR for a new SDK method that wraps an async inference endpoint. The method signature is clean but the error handling is opaque to the end developer. How do you give feedback, and what does a good error surface look like in a developer SDK?	The JD requires an engineering background and deep empathy for developers. This question tests whether the candidate can engage at the code-level on developer experience quality.
domain	How would you instrument a model deployment platform to give developers meaningful observability — what metrics matter most, and how do you surface them without overwhelming the user?	The JD explicitly calls out 'observability and management experiences' as a responsibility. This tests the candidate's ability to translate infrastructure telemetry into actionable developer-facing product features.
culture	Baseten serves companies at the frontier of AI — customers who are often building things that don't have established patterns yet. How do you do product discovery when neither you nor your customers have a clear map of what 'good' looks like?	The JD emphasizes collaborating with customers to define solutions for complex AI deployment. This tests the candidate's comfort with ambiguity and frontier product development, which is core to Baseten's positioning.
behavioral	Walk me through how you define and track product success metrics for a developer platform. Give a specific example of a metric you owned, how you instrumented it, and how it changed your roadmap decisions.	The JD explicitly requires 'proven ability to define product success metrics and iterate based on data and feedback.' The candidate's Intuit work (675M engagements, 275% YoY growth, 6K→50K TPS) provides a strong anchor.

Talking points

At Intuit, I owned the ICE developer platform end-to-end — built the Self-Service DevPortal, GitOps config layer, and ICE Playground, cutting developer onboarding from 2–3 weeks to under 24 hours for production. That platform scaled to 675M+ engagements in FY23 and 50K TPS via rSocket migration. I know what it takes to make infrastructure invisible to developers and measurable to the business.
I've built and shipped developer SDKs in the real world — extended Java and Python SDK Starter Kits with scaffolding, Gradle/Maven build configs, testing frameworks, and CI/CD integration at Intuit, and built the OpenClaw multi-agent orchestration framework (gateway protocol, subagent delegation, session management) at Streamio. I can engage credibly with engineering on API design, SDK ergonomics, and integration surface tradeoffs.
I built an RL post-training workbench from scratch — covering GRPO, DPO, PPO, and 9 other algorithms across TRL, VeRL, OpenRLHF, and NeMo RL, with live SSE metric streaming, GPU Docker passthrough, and head-to-head framework benchmarking. I also built aeval, a local-first model evaluation platform with FastAPI, TimescaleDB, Redis, and Ollama. I understand ML inference and training workflows at the product and implementation level, not just conceptually.
I've done 0-to-1 product launches on both the enterprise platform side (ICE Self-Service at Intuit, Scheduler Service at Splunk delivered in ~4 months and demoed at .conf19) and the startup side (Streamio and Fintellect, including customer discovery, App Store launch, and iterative refinement). I know how to move fast, define success metrics early, and adjust based on real user signal.
My engineering foundation is deep and current — UC Berkeley Computational Engineering Science, NeurIPS-published ML researcher, hand-coded BPTT in C++ in 2004, and actively building production AI systems today (FastAPI, PyTorch, React/TypeScript, Docker, Redis, multimodal LLM pipelines). I can sit in a technical design review, read a PR, and give meaningful feedback — which matters when your customers are the engineers building frontier AI products.