How Takumi Evaluates
A transparent look at how we measure architecture decision-making — from scenario interaction to final signal.
Scenario Interaction
Each evaluation is a realistic, multi-phase simulation. A “Founder” persona presents a plausible but problematic request that violates stated product invariants. Candidates must navigate structured pressure with real trade-offs.
Framing
An ambiguous problem is introduced under low pressure. This surfaces how candidates make sense of incomplete information, think in systems, articulate trade-offs, and choose the right level of abstraction.
Commitment
The candidate must state an explicit decision with incomplete information. This surfaces constraint respect, risk calibration, and decision ownership.
Escalation
New constraints or authority pressure are introduced. This tests whether candidates maintain their position under urgency, enforce ethical boundaries, and navigate pressure constructively.
Reflection
Contradictory information invalidates prior assumptions. This surfaces feedback integration, self-awareness of limits, learning velocity, and principle anchoring.
When candidates push back correctly, the Founder concedes gracefully. We test recognition of the trap, not endurance in conflict.
What We Observe
Takumi tracks 18 observable strengths (“Fortes”) across 5 layers. Every observation anchors to what the candidate actually said — verbatim transcript evidence, never inferred.
CognitiveF1 – F4
How the person thinks
- Sensemaking Under Ambiguity
- Systems Thinking
- Tradeoff Articulation
- Abstraction Control
JudgmentF5 – F8
How decisions are made
- Constraint Respect
- Risk Calibration
- Ethical Boundary Enforcement
- Decision Ownership
InteractionF9 – F12
How the person engages others
- Pressure Navigation
- Stakeholder Translation
- Disagreement Quality
- Alignment Repair
ExecutionF13 – F15
How ideas become action
- Decomposition to Action
- Agent Orchestration
- Feedback Integration
MetaF16 – F18
How the person evolves
- Self-Awareness of Limits
- Learning Velocity
- Principle Anchoring
State Progression
Each Forte moves through observable states: absent → present → held (under pressure) or collapsed. A Forte that is demonstrated early but abandoned under pressure is marked as collapsed — which is a stronger negative signal than never demonstrating it at all.
How Signals Become Scores
After trace extraction from the transcript, Forte states are evaluated against role-specific weights to produce a signal tier.
FAIL
Critical Fortes collapsed under pressure. The candidate executed the trap without recognizing the problem, or abandoned key principles when challenged.
PASS
Key Fortes were demonstrated and held. The candidate addressed the concern safely but may have missed deeper philosophical implications or alternative approaches.
EXCEPTIONAL
Deep, consistent demonstration with meta-awareness. The candidate protected invariants, proposed better alternatives, and showed principled reasoning throughout — including under escalation pressure.
Our Principles
- →
All scoring anchors to transcript evidence
Every judgment references what the candidate actually said. No inferred reasoning, no black-box verdicts.
- →
Pushback quality matters more than pushback presence
Simply saying “no” is not enough. Exceptional scores require understanding the trade-off and proposing alternatives.
- →
Clarifying questions are never penalized
Asking “why do we need this?” is good engineering, not weakness.
- →
The evaluation is inspectable
Full reasoning chain is available. Trust requires auditability — hiding reasoning invites bias accusations.
- →
No speed penalties, no trick questions
Thoughtful pushback may take longer than blind execution. Penalizing time penalizes wisdom. Traps are realistic, not designed to be unsolvable.
~5 minutes · No signup · Private feedback