← databricks / Sr. Product Manager, Databricks AI

brief / art_NhSpaxUCyoE

role

databricks / Sr. Product Manager, Databricks AI

model

anthropic/claude-sonnet-4.6

created

2026-05-19T23:40

Company snapshot

Databricks is the data and AI company behind the Lakehouse architecture, Apache Spark, Delta Lake, and MLflow — all created by its founding team. More than 10,000 organizations, including over 50% of the Fortune 500, use the Databricks Data Intelligence Platform to unify data, analytics, and AI. In the last 12–24 months Databricks has aggressively expanded into generative AI and agent tooling (MLflow 2.x with LLM tracing, Model Serving, AI Gateway, and Mosaic AI), and completed its acquisition of MosaicML in 2023 to bolster foundation-model training capabilities. The company filed confidentially for an IPO (reported 2024–2025); specific timing and details are not confirmed. Databricks has a strong engineering-first reputation and is known for open-source leadership and deep research partnerships.

Team stack

Based on the JD and public Databricks signals: Python-first data/ML stack (PySpark, MLflow, Delta Lake); model serving via Databricks Model Serving (likely MLflow + Ray Serve under the hood); vector search and RAG tooling built on top of Delta Lake and Unity Catalog; agent orchestration likely referencing LangChain/LlamaIndex integrations and Databricks AI Gateway; SQL analytics via Databricks SQL / Photon engine. Internal tooling likely includes Terraform/GitOps for infra, and the platform surfaces through a notebook-centric UX (Databricks Notebooks) plus REST APIs and SDKs. Generative AI workloads are a primary growth area per the JD. Specific internal tooling beyond public signals is uncertain.

Likely questions (10)

area	question	why
system_design	How would you design an enterprise-grade agent orchestration platform on top of Databricks — covering routing, memory, tool-calling, and observability — for a Fortune 500 customer?	The JD explicitly calls out 'orchestrate complex workflows' and 'develop agents and models' as core team missions; this tests whether the candidate can translate that vision into a concrete architecture.
system_design	Walk us through how you'd design a multi-framework RL post-training evaluation harness that works across TRL, VeRL, and OpenRLHF — what are the key abstractions and where do you standardize vs. stay framework-agnostic?	The JD asks for deep AI/ML technical depth; the candidate's RL Workbench is directly relevant and this question probes whether they can articulate the design decisions behind it.
domain	Databricks customers want to fine-tune and evaluate LLMs on their proprietary data without it leaving their environment. How would you define the product requirements for a secure, on-platform RLHF/DPO post-training workflow?	The JD emphasizes 'trusted tools' and enterprise AI; RLHF/DPO post-training is a hot enterprise need and directly maps to the candidate's RL Workbench work.
domain	How do you think about model evaluation as a product surface — what metrics, UX patterns, and CI/CD hooks matter most for enterprise ML teams, and how would you prioritize them on a roadmap?	The JD calls for turning 'breakthroughs into practical tools'; the candidate built aeval and AutoEval, making this a natural probe of product thinking around eval.
behavioral	Tell me about a time you drove a 0-to-1 platform product from concept to launch in a fast-moving space. What did you get wrong early, and how did you course-correct?	The JD explicitly asks for 'track record of bringing products from vision to launch in fast-moving, competitive spaces'; this is a direct behavioral signal check.
behavioral	Describe a situation where you had to align senior engineers and research leaders around a product direction they were skeptical of. How did you build conviction and move forward?	The JD calls out 'partner with world-class engineering and research teams' and 'inspire the roadmap'; Databricks PMs work closely with PhD-level researchers, so influence without authority is critical.
coding	Given a table of model evaluation runs (model_id, eval_type, score, timestamp, framework), write a SQL query to identify which framework shows the highest average score improvement week-over-week, filtering out runs with fewer than 10 samples.	The JD explicitly requires 'comfortable working with SQL, product usage data, and operational dashboards'; the candidate also has BigQuery/SQL experience at Intuit.
culture	Databricks moves extremely fast and the AI landscape shifts weekly. How do you decide when to commit to a product direction vs. staying flexible — and how have you managed that tension in a previous role?	The JD states 'the AI industry is evolving rapidly' and calls for 'first-principles thinking and agility'; this is a direct culture-fit probe for Databricks' operating style.
domain	How would you define and measure 'developer experience' for an AI platform SDK — what leading indicators tell you the SDK is actually reducing friction before you see downstream adoption numbers?	The JD targets 'enterprise SaaS or developer platforms'; the candidate's Intuit SDK Starter Kit work and ICE platform are directly relevant, and Databricks has significant SDK/API surface area.
behavioral	Give me an example of using quantitative data — usage telemetry, SQL analysis, or benchmarks — to change a product decision that was heading in the wrong direction.	The JD calls for 'strong analytical skills' and 'product usage data'; the candidate has BigQuery/SQL work at Intuit and benchmark data from the RL Workbench, so this tests whether they can narrate data-driven PM decisions.

Talking points

RL post-training at the framework level: Built a 3-phase RL Workbench benchmarking 12 algorithms (PPO, GRPO, DAPO, DPO, SimPO, and more) across TRL, VeRL, OpenRLHF, and NeMo RL with live SSE metric streaming and GPU Docker passthrough — directly relevant to Databricks' Mosaic AI and model training ambitions. This is not PM-adjacent familiarity; this is hands-on implementation.
Developer platform at Intuit scale: Owned ICE platform (DevPortal, GitOps, SDK Starter Kits) that scaled to 675M+ engagements in FY23, cut developer onboarding from weeks to minutes, and drove 275% YoY engagement growth — a direct proof point for the JD's enterprise SaaS platform experience requirement.
Agent orchestration product shipped to production: Designed and built OpenClaw multi-agent orchestration framework with gateway protocol, subagent delegation, and session management inside StreamIO — not a prototype, but a production macOS/Linux/iOS product with Stripe billing and OAuth, demonstrating 0-to-1 AI product execution.
Model evaluation as a rigorous product discipline: Built aeval, a local-first evaluation platform with 5 eval types, adversarial safety testing, bootstrap confidence intervals, Welch's t-test, and Cohen's d — plus CI/CD regression detection gates. This maps directly to Databricks' need for 'trusted tools' and enterprise-grade AI quality assurance.
NeurIPS-published researcher with 20-year ML arc: Published at NeurIPS 2014 on neural networks for protein structure prediction (hand-coded BPTT in C++ in 2004, rewritten to 8B-parameter PyTorch in 2026) — gives credibility to partner with Databricks' PhD research teams and speak fluently about model architecture tradeoffs, not just product surfaces.