← cursor / Product Manager, Agent Harness
tailored_resume_v2 / art_PqfhmAyaeRA
role
model
anthropic/claude-sonnet-4.6
created
2026-05-20T01:50
↓ Download .docx ↓ Download .pdf PDF requires LibreOffice installed
What changed for cursor
| change | why it matters |
|---|---|
| Projects section moved to lead position above Professional Experience | RL Workbench and aeval are the strongest proof points for the Agent Harness role — directly demonstrate RL practitioner depth, evaluation framework design, and agent trace analysis; leading with them maximizes perceived fit immediately |
| Summary rewritten to lead with 'research-product boundary' framing and call out multi-agent orchestration, RL workbench, and evaluation harnesses explicitly | JD's first sentence defines the role as living at the research-product boundary; summary must signal that identity immediately |
| AutoEval reframed as 'Automated Visual Evaluation for Agent Model Training' and described as an 'agent evaluation harness' | JD's core responsibility is building evaluation harnesses; reframing AutoEval in that language (accurately) maximizes relevance signal |
| Streamio CEO title reframed to 'Founder & CEO — Multi-Agent Platform' and OpenClaw bullet moved to lead | OpenClaw subagent delegation and MCP SDK integration directly mirror Agent Harness primitives (tool access, subagent coordination, MCP); leading with it is the strongest proof point for this role |
| MCP SDK bullet elevated and reframed as 'defining agent extensibility primitives' | JD explicitly calls out MCP and plugin extensibility as a core responsibility; accurate reframe using JD's exact language |
| Intuit telemetry bullet reframed to explicitly connect to 'analyzing agent traces at scale and turning patterns into concrete improvements' | JD requires trace analysis discipline; Intuit's BigQuery/SQL usage data work is the closest enterprise-scale proof point |
| Splunk benchmarking bullet reframed with 'empirical, measurement-first approach to defining what good looks like' | JD emphasizes empirical results shaping roadmap; Splunk's 10x benchmark work demonstrates that discipline |
| IBM bullet reframed to connect root cause analysis to 'failure-mode diagnosis discipline central to agent trace analysis' | JD requires deep failure mode analysis; IBM's escalation RCA work is the earliest proof point of that skill |
| Kaiser condensed to 1 bullet focused on platform reliability and observability | Low relevance to Agent Harness role; kept for career continuity but minimized to preserve space for higher-signal content |
| Bank of America Merrill Lynch role omitted | Summer associate role with no relevance to agent frameworks, RL, or developer tools; omitting preserves space for high-signal content without violating anti-patterns (role was not a primary career role) |
| Fintellect fallback routing bullet reframed as 'failure recovery and error handling patterns essential to reliable agent execution' | JD explicitly requires agents to handle failures and retries; LLM fallback routing is an accurate analog |
| aeval bullet rewritten to explicitly call out 'task completion rate, hallucination frequency, and error recovery' metrics | These are the exact success metrics the JD names; accurate reframe using JD's language maximizes signal |
JD analysis (20 key phrases)
Key phrases: agent harnessagent planning and execution frameworkdecompose tasks into subtasksfailure modesevaluation frameworksagent tracesmulti-agent coordinationreinforcement learningdeveloper toolsguardrailstask completion ratehallucination frequencyerror recoveryMCPsubagent delegationempirical resultsautonomy with predictabilityobserve and steerbenchmarking systemsagent extensibility
Hard requirements:
- Built or evaluated AI agents, LLM applications, or ML-powered developer tools
- Deeply technical — comfortable reading code, analyzing traces, reasoning about system behavior
- Strong intuition for evaluation and measurement — metrics that capture quality not just activity
- Experience with reinforcement learning, agent frameworks, or AI evaluation
- Comfortable in research-adjacent environment where roadmap is shaped by empirical results
Preferred qualifications:
- Multi-agent coordination experience
- Agent planning and execution framework ownership
- Evaluation and benchmarking system design
- MCP/plugin extensibility primitives
- Real-time RL on user data familiarity
Per-role mapping (10 roles scored)
| role | score | reframe angle | JD phrases that map |
|---|---|---|---|
| Streamio AI — Founder & CEO | 4/5 | Lead with multi-agent orchestration and MCP primitives — directly mirrors Agent Harness responsibilities | subagent delegation, MCP, multi-agent coordination, agent extensibility, agent planning and execution framework |
| Fintellect AI — Founder & CEO | 3/5 | Frame as LLM agent orchestration with failure handling and fallback — maps to error recovery and agent reliability | failure modes, error recovery, agent frameworks, LLM applications |
| Intuit — Staff PM | 3/5 | Frame as developer-facing platform PM with deep telemetry/measurement discipline — maps to evaluation frameworks and developer trust | developer tools, empirical results, evaluation frameworks, observe and steer |
| Splunk — Senior PM | 2/5 | Frame as orchestration and performance benchmarking — maps to agent trace analysis and measurement | benchmarking systems, failure modes, empirical results |
| Kaiser Permanente — SOA Technical PM | 1/5 | Condense to 1 bullet on platform reliability and scale | — |
| IBM — Software Engineer | 1/5 | 1 bullet — keep for technical credibility | failure modes |
| RL Workbench | 5/5 | Lead projects section — directly demonstrates RL practitioner depth and evaluation framework design | reinforcement learning, evaluation frameworks, benchmarking systems, empirical results, task completion rate |
| aeval — AI Model Evaluation Platform | 5/5 | Second project — directly maps to evaluation framework design and hallucination/failure measurement | evaluation frameworks, hallucination frequency, task completion rate, benchmarking systems, error recovery |
| AutoEval — Automated Visual Evaluation for Robot Model Training | 4/5 | Frame as automated agent evaluation harness — directly mirrors Agent Harness evaluation responsibilities | evaluation frameworks, agent traces, failure modes, benchmarking systems |
| BRAIN — Protein Structure Prediction ML Platform | 3/5 | Frame as deep ML research-to-production credibility; condense | reinforcement learning, empirical results, research-adjacent |
Tailored summary
Technical PM at the research-product boundary — 12+ years building developer-facing platforms and AI agent frameworks, from shipping SDK tooling at 675M+ engagements (Intuit) to building multi-agent orchestration systems, RL post-training workbenches, and AI evaluation harnesses from scratch. Designed and implemented subagent delegation frameworks, MCP-integrated agent pipelines, and statistical evaluation systems measuring task completion, error recovery, and hallucination frequency. NeurIPS published ML researcher; hands-on practitioner across PPO, GRPO, DPO, and 9 additional RL algorithms.