← cursor / Product Manager, Agent Harness

tailored_resume_v2 / art_PqfhmAyaeRA

role

cursor / Product Manager, Agent Harness

model

anthropic/claude-sonnet-4.6

created

2026-05-20T01:50

↓ Download .docx ↓ Download .pdf PDF requires LibreOffice installed

What changed for cursor

change	why it matters
Projects section moved to lead position above Professional Experience	RL Workbench and aeval are the strongest proof points for the Agent Harness role — directly demonstrate RL practitioner depth, evaluation framework design, and agent trace analysis; leading with them maximizes perceived fit immediately
Summary rewritten to lead with 'research-product boundary' framing and call out multi-agent orchestration, RL workbench, and evaluation harnesses explicitly	JD's first sentence defines the role as living at the research-product boundary; summary must signal that identity immediately
AutoEval reframed as 'Automated Visual Evaluation for Agent Model Training' and described as an 'agent evaluation harness'	JD's core responsibility is building evaluation harnesses; reframing AutoEval in that language (accurately) maximizes relevance signal
Streamio CEO title reframed to 'Founder & CEO — Multi-Agent Platform' and OpenClaw bullet moved to lead	OpenClaw subagent delegation and MCP SDK integration directly mirror Agent Harness primitives (tool access, subagent coordination, MCP); leading with it is the strongest proof point for this role
MCP SDK bullet elevated and reframed as 'defining agent extensibility primitives'	JD explicitly calls out MCP and plugin extensibility as a core responsibility; accurate reframe using JD's exact language
Intuit telemetry bullet reframed to explicitly connect to 'analyzing agent traces at scale and turning patterns into concrete improvements'	JD requires trace analysis discipline; Intuit's BigQuery/SQL usage data work is the closest enterprise-scale proof point
Splunk benchmarking bullet reframed with 'empirical, measurement-first approach to defining what good looks like'	JD emphasizes empirical results shaping roadmap; Splunk's 10x benchmark work demonstrates that discipline
IBM bullet reframed to connect root cause analysis to 'failure-mode diagnosis discipline central to agent trace analysis'	JD requires deep failure mode analysis; IBM's escalation RCA work is the earliest proof point of that skill
Kaiser condensed to 1 bullet focused on platform reliability and observability	Low relevance to Agent Harness role; kept for career continuity but minimized to preserve space for higher-signal content
Bank of America Merrill Lynch role omitted	Summer associate role with no relevance to agent frameworks, RL, or developer tools; omitting preserves space for high-signal content without violating anti-patterns (role was not a primary career role)
Fintellect fallback routing bullet reframed as 'failure recovery and error handling patterns essential to reliable agent execution'	JD explicitly requires agents to handle failures and retries; LLM fallback routing is an accurate analog
aeval bullet rewritten to explicitly call out 'task completion rate, hallucination frequency, and error recovery' metrics	These are the exact success metrics the JD names; accurate reframe using JD's language maximizes signal

JD analysis (20 key phrases)

Key phrases: agent harnessagent planning and execution frameworkdecompose tasks into subtasksfailure modesevaluation frameworksagent tracesmulti-agent coordinationreinforcement learningdeveloper toolsguardrailstask completion ratehallucination frequencyerror recoveryMCPsubagent delegationempirical resultsautonomy with predictabilityobserve and steerbenchmarking systemsagent extensibility

Hard requirements:

Built or evaluated AI agents, LLM applications, or ML-powered developer tools
Deeply technical — comfortable reading code, analyzing traces, reasoning about system behavior
Strong intuition for evaluation and measurement — metrics that capture quality not just activity
Experience with reinforcement learning, agent frameworks, or AI evaluation
Comfortable in research-adjacent environment where roadmap is shaped by empirical results

Preferred qualifications:

Multi-agent coordination experience
Agent planning and execution framework ownership
Evaluation and benchmarking system design
MCP/plugin extensibility primitives
Real-time RL on user data familiarity

Per-role mapping (10 roles scored)

role	score	reframe angle	JD phrases that map
Streamio AI — Founder & CEO	4/5	Lead with multi-agent orchestration and MCP primitives — directly mirrors Agent Harness responsibilities	subagent delegation, MCP, multi-agent coordination, agent extensibility, agent planning and execution framework
Fintellect AI — Founder & CEO	3/5	Frame as LLM agent orchestration with failure handling and fallback — maps to error recovery and agent reliability	failure modes, error recovery, agent frameworks, LLM applications
Intuit — Staff PM	3/5	Frame as developer-facing platform PM with deep telemetry/measurement discipline — maps to evaluation frameworks and developer trust	developer tools, empirical results, evaluation frameworks, observe and steer
Splunk — Senior PM	2/5	Frame as orchestration and performance benchmarking — maps to agent trace analysis and measurement	benchmarking systems, failure modes, empirical results
Kaiser Permanente — SOA Technical PM	1/5	Condense to 1 bullet on platform reliability and scale	—
IBM — Software Engineer	1/5	1 bullet — keep for technical credibility	failure modes
RL Workbench	5/5	Lead projects section — directly demonstrates RL practitioner depth and evaluation framework design	reinforcement learning, evaluation frameworks, benchmarking systems, empirical results, task completion rate
aeval — AI Model Evaluation Platform	5/5	Second project — directly maps to evaluation framework design and hallucination/failure measurement	evaluation frameworks, hallucination frequency, task completion rate, benchmarking systems, error recovery
AutoEval — Automated Visual Evaluation for Robot Model Training	4/5	Frame as automated agent evaluation harness — directly mirrors Agent Harness evaluation responsibilities	evaluation frameworks, agent traces, failure modes, benchmarking systems
BRAIN — Protein Structure Prediction ML Platform	3/5	Frame as deep ML research-to-production credibility; condense	reinforcement learning, empirical results, research-adjacent

Tailored summary

Technical PM at the research-product boundary — 12+ years building developer-facing platforms and AI agent frameworks, from shipping SDK tooling at 675M+ engagements (Intuit) to building multi-agent orchestration systems, RL post-training workbenches, and AI evaluation harnesses from scratch. Designed and implemented subagent delegation frameworks, MCP-integrated agent pipelines, and statistical evaluation systems measuring task completion, error recovery, and hallucination frequency. NeurIPS published ML researcher; hands-on practitioner across PPO, GRPO, DPO, and 9 additional RL algorithms.