← five9 / Senior Product Manager, AI Innovations - Agent Assist
brief / art_Y0Wu2I78Cmg
role
model
anthropic/claude-sonnet-4.6
created
2026-06-09T05:30
Company snapshot
Five9 is a publicly traded cloud contact center software provider (CCaaS) competing with Genesys, NICE CXone, and Talkdesk, serving enterprise and mid-market customers globally. The company has been aggressively investing in AI-powered features — including real-time agent assist, automated summaries, and conversational AI — as the contact center industry shifts from rule-based IVR to LLM-driven orchestration. Five9 acquired Inference Solutions (IVA/voicebot) and has partnered with major cloud hyperscalers; based on public signals, they have been expanding their Genius AI suite. Engineering reputation is generally solid for cloud-native telephony infrastructure, though specific internal team culture details are not available to confirm. The $90K–$250K band suggests this is a senior IC PM role with meaningful scope, not a director-level position.
Team stack
Based on the JD and Five9's public product documentation, the team likely works with: real-time NLP/ASR pipelines (likely integrated with partners such as Google CCAI or proprietary models), LLM APIs for summarization and next-best-action suggestions, low-latency streaming infrastructure for in-call agent guidance (sub-second latency is a hard constraint in contact center), and CRM integrations (Salesforce, ServiceNow). Backend is likely microservices on AWS or GCP (based on JD references to cloud and Five9's known infrastructure). ML model serving likely uses managed endpoints with latency SLAs. Data stack probably includes a feature store or event-streaming layer (Kafka-class) for real-time signal ingestion. Front-end agent desktop is likely a browser-based softphone widget. All inferences are based on the JD and Five9's public product pages.
Likely questions (10)
| area | question | why |
|---|---|---|
| domain | Walk me through how you would define the product roadmap for an Agent Assist feature — say, real-time suggested responses — from discovery through GA launch. What are the key decision gates? | The JD explicitly calls out 'end-to-end product lifecycle' ownership and 'ideation through market launch.' This is the core competency test for the role. |
| system_design | Agent Assist must surface a suggested response to a live agent within ~300ms of a customer utterance. Walk me through the architecture tradeoffs you'd consider — model size, streaming vs. batch inference, caching — and how you'd define the latency SLA with engineering. | The JD specifically flags 'latency constraints, optimization, and accuracy challenges' as a critical competency under Applied AI Proficiency. |
| domain | How do you measure whether an Agent Assist feature is actually helping agents? What KPIs would you instrument from day one, and how do you separate correlation from causation when AHT drops after a rollout? | The JD calls out 'leverage KPIs to track user adoption and product efficacy' and 'data-driven decision making' as explicit requirements. |
| behavioral | Tell me about a time you had to make a significant product decision with incomplete data or under ambiguity. How did you frame the decision, and what was the outcome? | The JD says 'make informed strategic decisions amidst ambiguity' — this is a direct signal they want a PM who can operate without perfect information. |
| behavioral | Describe a situation where you had to push back on engineering or data science on a technical tradeoff — for example, model accuracy vs. latency — and how you resolved it. | The JD emphasizes 'navigate complex data environments, AI/ML parameters, and optimize technical and user experience tradeoffs' with Engineering and Data Science partners. |
| domain | Contact center enterprises are notoriously risk-averse about AI errors (wrong suggestions, hallucinated policy info). How would you design a beta program and a confidence/fallback mechanism for an LLM-powered suggestion feature? | The JD calls out 'facilitate beta testing' and 'enterprise AI deployments' complexity — accuracy and trust are existential in regulated contact center environments. |
| coding | You're reviewing a proposed A/B test for a new Agent Assist nudge. The data scientist proposes a 50/50 split with a two-week runtime. What questions do you ask to validate the experimental design, and what statistical concepts would you apply? | The JD requires 'deep understanding of ML-driven product development' and 'data-driven decision making'; this tests whether the candidate can partner credibly with data science. |
| culture | Five9 describes itself as 'team-first' but this role requires high autonomy. How do you balance being self-directed with keeping cross-functional stakeholders aligned, especially in a remote or hybrid environment? | The JD explicitly states 'highly autonomous operator' and 'team-first culture' in the same breath — they want to probe for the tension between independence and collaboration. |
| domain | How would you approach competitive differentiation for Agent Assist against Google CCAI, Amazon Connect Wisdom, and Salesforce Einstein? What dimensions matter most to enterprise buyers? | The JD calls out 'analyze competitive landscape' as a key responsibility — expect a market-positioning question. |
| behavioral | Give me an example of a developer-facing or platform product you owned where you had to balance the needs of internal engineering consumers with external enterprise customers. How did you prioritize? | The candidate's background is heavily platform/SDK (Intuit ICE, Splunk SCS) — the interviewer will probe whether that translates to end-user-facing contact center AI product thinking. |
Talking points
- At Intuit, I owned the ICE platform end-to-end — scaling from 6K to 50K TPS via rSocket migration supporting ~1.5M concurrent connections with sub-25ms TP99. Real-time latency at scale is not abstract to me; it's a constraint I've shipped against. For Agent Assist, I'd apply the same discipline: define the latency SLA early, instrument it as a first-class KPI, and make it a gate in the definition of done.
- I've built and benchmarked LLM post-training pipelines hands-on — my RL Workbench covers 12 algorithms (PPO, GRPO, DPO, and more) across TRL, VeRL, OpenRLHF, and NeMo RL with live metric streaming. This means I can have a credible technical conversation with data science about model tradeoffs — accuracy vs. latency, reward shaping, RLHF pipeline design — without needing it translated for me.
- I built aeval, a local-first model evaluation platform with adversarial safety testing, bootstrap confidence intervals, Welch's t-test, and automated safety gates integrated into CI/CD. For an Agent Assist product where a hallucinated policy suggestion can cost an enterprise customer a compliance violation, I know how to instrument evaluation rigor into the development process, not bolt it on at the end.
- At Intuit, I delivered the ICE Self-Service DevPortal that reduced developer onboarding from 2–3 weeks to minutes, and I drove 275% YoY growth in platform engagements to 675M+ in FY23. I know how to take a platform from 0-to-1, instrument adoption, and iterate based on usage telemetry — the same motion applies to rolling out Agent Assist to a contact center enterprise: instrument, measure agent adoption, run structured beta cohorts, and iterate.
- I've operated as a founder (Streamio AI, Fintellect AI) shipping production AI products end-to-end — from multi-agent orchestration (OpenClaw) to RAG pipelines with ChromaDB, multi-provider LLM fallback routing, and structured output validation. I'm comfortable with the full stack of applied AI product decisions: when to use RAG vs. fine-tuning, how to handle model degradation in production, and how to design fallback UX when confidence is low — all directly relevant to Agent Assist reliability.