← sofi / Principal Product Manager, AI SDLC
brief / art_hcXqe2ef0NI
role
model
anthropic/claude-sonnet-4.6
created
2026-05-20T22:01
Company snapshot
SoFi Technologies is a next-generation digital personal finance company and FDIC-insured national bank (SoFi Bank, N.A.) offering lending, investing, banking, and insurance products to millions of members via a mobile-first platform. The company has been expanding its financial services ecosystem aggressively, including its Galileo and Technisys B2B fintech infrastructure subsidiaries, positioning itself as both a consumer brand and a fintech infrastructure provider. SoFi has publicly emphasized AI and automation as core to its engineering velocity strategy, consistent with this JD's ambition to infuse AI across the full SDLC. Specific recent internal AI initiatives are not publicly confirmed beyond general statements; claims about named projects or teams would be speculative. Engineering reputation is generally regarded as mobile-first, cloud-native, and compliance-conscious given its bank charter.
Team stack
Based on the JD and public signals: CI/CD pipelines likely on GitHub Actions or similar (JD references 'repositories' and 'CI/CD pipelines' generically); cloud infrastructure likely AWS or GCP (common at fintech scale); backend services likely Java/Kotlin or Python microservices (common in fintech platforms); LLM integration likely via OpenAI, Anthropic, or internal model APIs (JD references 'AI agents' and 'structured specifications'); testing/quality tooling stack uncertain but JD implies investment in automated test generation; observability stack likely Datadog or Splunk (given SoFi's enterprise scale); governance/compliance tooling likely custom-built given bank charter requirements. Agentic workflow tooling (e.g., LangChain, custom orchestration) is likely nascent and this role is meant to define it.
Likely questions (10)
| area | question | why |
|---|---|---|
| system_design | Walk us through how you would architect a 'Spec-to-Code-to-Deploy' pipeline where a structured product spec triggers AI agents to generate production-ready code, tests, and documentation. What are the key failure modes and how do you govern AI autonomy at each stage? | This is the literal centerpiece of the JD — the role owns this workflow end-to-end and must demonstrate architectural intuition about agentic pipelines, not just PM instincts. |
| domain | You've benchmarked GRPO, DPO, PPO, and other RL algorithms in your workbench. How would you apply RL post-training or RLHF concepts to improve an AI coding agent that generates code inside a CI/CD pipeline — what reward signals would you use and how would you evaluate them? | JD requires deep familiarity with agentic patterns and evaluation strategies; your RL Workbench evidence is directly relevant and they will probe whether you can connect ML research fluency to product decisions. |
| system_design | How would you design an AI evaluation framework (evals) for an agent that writes and merges production code at a regulated financial institution? What metrics matter, how do you detect regressions, and how do you satisfy security and compliance stakeholders? | JD explicitly calls out 'measurable evaluation frameworks,' 'reliable, safe, and continuously improving' AI workflows, and governance with Security/Legal/Architecture — SoFi is a bank, so compliance risk is acute. |
| behavioral | Tell me about a time you drove cross-organizational adoption of a developer platform standard that engineers initially resisted. How did you build buy-in and measure success? | JD calls out 'drive cross-organizational adoption of AI-enabled development standards' and 'align diverse stakeholders' — Intuit ICE platform experience is the obvious draw here. |
| behavioral | Describe a situation where you had to balance a long-term platform investment against a near-term win that leadership was pressuring you to deliver. How did you frame the trade-off and what was the outcome? | JD explicitly states 'balance long-term platform investments with near-term demonstrable wins' — this is a known tension in platform PM roles and they want evidence of judgment under pressure. |
| coding | Given a CI/CD pipeline that runs 10,000 test cases per deploy, how would you design an AI triage system that identifies flaky tests, predicts test failures before they run, and prioritizes which tests to execute for a given code diff? What data would you need and how would you instrument it? | JD references 'testing frameworks' and 'telemetry requirements' as core ownership areas; this tests whether the candidate can reason about data instrumentation and ML product design at pipeline scale. |
| domain | How do you think about the governance boundary between AI autonomy and human approval in an agentic SDLC at a federally chartered bank? Where would you draw the line on what an AI agent can merge, deploy, or rollback without human sign-off? | SoFi is a national bank subject to OCC/FDIC oversight — the JD specifically calls out 'governance boundaries for AI autonomy' and partnering with Security and Legal; this is a differentiating question for fintech vs. pure tech. |
| culture | This role is described as 'operating like a startup founder inside the company.' Given that you've actually founded two companies, how do you think about the constraints that are different inside a large regulated enterprise, and where does that founder mindset create friction? | JD uses the 'startup founder inside the company' framing explicitly; your background as an actual founder is a double-edged signal — they want to know you can operate within enterprise constraints, not just move fast and break things. |
| behavioral | Walk me through how you ran developer discovery at Intuit to identify pain points across 20 mobile apps and 30+ SKUs. How did you translate telemetry data and qualitative feedback into a prioritized roadmap? | JD emphasizes 'exceptional discovery skills,' 'run fast build-measure-learn loops with internal developer users,' and 'telemetry requirements' — your Intuit SQL/BigQuery work and ICE platform are the direct evidence base. |
| domain | Your aeval platform implements bootstrap confidence intervals, Welch's t-test, and Cohen's d for model evaluation. How would you translate that statistical rigor into an executive-facing dashboard that communicates AI SDLC health to a CTO or CPO who doesn't want to see p-values? | JD requires both 'defining measurable evaluation frameworks' and 'exceptional executive communication skills' — this question tests whether you can bridge deep technical rigor with leadership storytelling. |
Talking points
- At Intuit, I owned the ICE platform end-to-end — reduced developer onboarding from 2–3 weeks to under 24 hours, scaled throughput from 6K to 50K TPS via rSocket migration, and drove 275% YoY growth to 675M+ engagements in FY23. That's the exact 'platform leadership at enterprise scale' this role requires, and I've done it inside a large, regulated, multi-product company.
- I built aeval, a local-first AI model evaluation platform with 5 eval types, adversarial safety testing, refusal detection, bootstrap confidence intervals, and CI/CD regression gates — directly mapping to the JD's requirement to 'establish measurable evaluation frameworks to ensure AI-driven workflows are reliable, safe, and continuously improving.' I didn't just spec this; I built and shipped it.
- My RL Workbench benchmarks 12 algorithms (PPO, GRPO, DPO, DAPO, REINFORCE, SimPO, KTO, and more) across TRL, VeRL, OpenRLHF, and NeMo RL with live SSE metric streaming and GPU Docker passthrough — giving me hands-on fluency with the post-training stack that underlies any serious AI coding agent. I can have a peer-level technical conversation with your principal engineers about agentic evaluation, not just PM it.
- I designed and shipped OpenClaw, a multi-agent orchestration framework with gateway protocol, subagent delegation, and session management — directly analogous to the 'AI agents integrating into specs, repositories, CI/CD pipelines' architecture this role needs to define. I've implemented the patterns the JD describes, not just read about them.
- I've conducted an enterprise-wide Service Language Assessment across 9 languages at Intuit, presented findings to the CTO, and built the Asterias declarative asset lifecycle platform with GraphQL API — demonstrating both the cross-organizational influence and the architectural intuition the JD calls out as non-negotiable for this principal-level role.