jobsearch v0.0.1

← anthropic / Product Manager, Developer Productivity

cover_letter / art_02vTO2wLfR8

role
anthropic / Product Manager, Developer Productivity
model
anthropic/claude-sonnet-4.6
created
2026-05-19T20:27

↓ Download .docx

Cover letter

Dear Anthropic Hiring Team, Anthropic's mission — building AI systems that are reliable, interpretable, and steerable — sits at the intersection of the two threads that have defined my career: rigorous engineering and the conviction that AI development tooling is itself a safety-critical surface. I came to that conviction early: in 2004 I hand-coded backpropagation through time in C++ to train a neural network on protein secondary structure prediction, and that work eventually became a NeurIPS-accepted paper. Two decades later I'm benchmarking GRPO, DPO, and PPO across TRL, VeRL, OpenRLHF, and NeMo RL on Apple Silicon and CUDA — and the throughput, reproducibility, and feedback-loop problems I encounter every day are exactly the problems this role exists to solve at Anthropic's scale. ## Technical Foundation My AI/ML work is not incidental to my product career — it is the substrate of it. The RL Workbench I built in 2026 is a three-phase post-training platform: a Reward Lab for designing and A/B testing reward functions (RLVR, learned, hybrid) across GSM8K, MATH, HumanEval, and UltraFeedback; a Playground that runs real TRL-powered GRPO and DPO training with live SSE metric streaming; and an Arena for head-to-head framework benchmarking across TRL, VeRL, OpenRLHF, and NeMo RL with GPU passthrough in Docker containers. Implementing 12 RL algorithms with standardized throughput, memory, and convergence benchmarking gave me direct, hands-on exposure to the accelerator toolchain challenges — MPS, CUDA, Docker GPU passthrough — that Anthropic's inference and research engineers navigate daily. On the evaluation side, I built aeval, a local-first model evaluation platform with a FastAPI orchestrator, TimescaleDB for longitudinal metrics, a Redis job queue, and a Next.js dashboard backed by Ollama. It supports bootstrap confidence intervals, Welch's t-test, Cohen's d effect size, saturation detection, and automated safety gates — the kind of statistical rigor that separates a dashboard from a decision-support system. CI/CD integration with regression detection means evaluation is part of the build loop, not an afterthought. That design philosophy — making quality signals part of the inner loop rather than a post-hoc audit — maps directly to what Anthropic needs as Claude takes on more of the code-writing and test-generation work. ## The Arc That Leads Here My path from NeurIPS researcher to Staff PM at Intuit to AI founder is not a pivot — it is a single line of inquiry into how engineering organizations move from idea to production safely and at speed. The developer productivity problems I solved at Intuit were large-scale versions of the same problems: how do you give thousands of engineers fast, reproducible, low-friction paths to production without sacrificing reliability or security? ## Why This Role What excites me most about this role is the framing in the job description: *defining what developer productivity means when a meaningful share of code is written, tested, and reviewed by Claude itself.* That is not a hypothetical — it is the design question I am already working through in my own projects, where Claude via MCP SDK is an active participant in code generation, screen-capture analysis, and multi-agent orchestration. The governance and trust challenges that come with autonomous agents in the inner loop are real, and I have direct experience building the guardrails: the OpenClaw multi-agent orchestration framework I built for StreamIO implements a gateway protocol with subagent delegation, profile management, and session switching — exactly the kind of governance primitive that will need to exist at Anthropic's scale as agent-driven workflows expand. ## Selected Relevant Experience - **ICE Self-Service Platform (Intuit):** Delivered DevPortal, GitOps config, and ICE Playground, reducing developer onboarding from 2–3 weeks to minutes in pre-prod and under 24 hours for production — while mitigating $1M+ in projected opex growth. Scaled ICE engagements 275% YoY to 675M+ in FY23 across QuickBooks, TurboTax, Mint, Mailchimp, and Credit Karma; scaled throughput from 6K to 50K TPS via rSocket migration supporting ~1.5M concurrent connections with sub-25ms TP99. - **Java and Python SDK Starter Kits (Intuit):** Extended SDK scaffolding with build configurations (Gradle/Maven), testing frameworks, and CI/CD integration — enabling developers to go from zero to production-ready microservice in minutes. This is the same muscle the role requires: abstracting build complexity without hiding it from engineers who need to reason about it. - **Enterprise Service Language Assessment (Intuit):** Conducted company-wide analysis across 9 languages (Java, Python, Kotlin, Go, TypeScript, Scala, PHP, C++, Groovy), synthesizing usage data and developer feedback into strategic investment recommendations presented to the CTO — the kind of cross-cutting platform strategy work this role demands. - **MSaaS Drift Detection (Intuit):** Wrote a Java JAR library to scan Git repos for configuration drift, partnered with Design on DevPortal UI, and built a remediation roadmap using OpenRewrite — a concrete example of owning the gap between what engineers intend and what actually ships. - **RL Workbench — GPU/Accelerator Toolchain (2026):** Built Docker-based GPU passthrough infrastructure for benchmarking TRL, VeRL, OpenRLHF, and NeMo RL — direct experience with the accelerator toolchain ecosystem (CUDA, MPS) and the reproducibility challenges of compute-intensive ML workloads. - **aeval — Engineering Productivity Metrics (2025–2026):** Designed and operationalized a custom eval metrics framework with statistical rigor, CI/CD regression detection, and automated safety gates — a working prototype of the productivity measurement evolution the JD describes: moving beyond commits and cycle time to measures that capture model quality, toil eliminated, and confidence-to-ship. - **Splunk Search Orchestration (Splunk Inc.):** Owned Search Service (Go microservices), Search Catalog (PostgreSQL), and SPL/SPL2 roadmap for Splunk Cloud Services; delivered Scheduler Service end-to-end in four months and led a query performance initiative achieving up to 10x improvements for a beta Fortune 500 customer — experience building and shipping platform infrastructure under real production constraints. ## Closing Anthropic's mission requires that the engineering organization building frontier AI models can move with both speed and confidence — and that the tooling governing human-agent collaboration is itself held to a high standard of reliability and interpretability. I have spent twelve years building exactly that kind of infrastructure, from the ground up, across developer platforms, AI frameworks, and multi-agent systems. I would welcome the opportunity to bring that experience to bear on the most consequential developer productivity challenge in the industry. Thank you for your consideration. Sincerely, **O. Felix Amoruwa** famoruwa@berkeley.edu | 909-731-9011 | felixamoruwa.info