How Takumi Evaluates

A transparent look at how we measure architecture decision-making — from scenario interaction to final signal.

Scenario Interaction

Each evaluation is a realistic, multi-phase simulation. A “Founder” persona presents a plausible but problematic request that violates stated product invariants. Candidates must navigate structured pressure with real trade-offs.

Phase 1

Framing

An ambiguous problem is introduced under low pressure. This surfaces how candidates make sense of incomplete information, think in systems, articulate trade-offs, and choose the right level of abstraction.

Phase 2

Commitment

The candidate must state an explicit decision with incomplete information. This surfaces constraint respect, risk calibration, and decision ownership.

Phase 3

Escalation

New constraints or authority pressure are introduced. This tests whether candidates maintain their position under urgency, enforce ethical boundaries, and navigate pressure constructively.

Phase 4

Reflection

Contradictory information invalidates prior assumptions. This surfaces feedback integration, self-awareness of limits, learning velocity, and principle anchoring.

When candidates push back correctly, the Founder concedes gracefully. We test recognition of the trap, not endurance in conflict.

What We Observe

Takumi tracks 18 observable strengths (“Fortes”) across 5 layers. Every observation anchors to what the candidate actually said — verbatim transcript evidence, never inferred.

CognitiveF1 – F4

How the person thinks

Sensemaking Under Ambiguity
Systems Thinking
Tradeoff Articulation
Abstraction Control

JudgmentF5 – F8

How decisions are made

Constraint Respect
Risk Calibration
Ethical Boundary Enforcement
Decision Ownership

InteractionF9 – F12

How the person engages others

Pressure Navigation
Stakeholder Translation
Disagreement Quality
Alignment Repair

ExecutionF13 – F15

How ideas become action

Decomposition to Action
Agent Orchestration
Feedback Integration

MetaF16 – F18

How the person evolves

Self-Awareness of Limits
Learning Velocity
Principle Anchoring

State Progression

Each Forte moves through observable states: absent → present → held (under pressure) or collapsed. A Forte that is demonstrated early but abandoned under pressure is marked as collapsed — which is a stronger negative signal than never demonstrating it at all.

How Signals Become Scores

After trace extraction from the transcript, Forte states are evaluated against role-specific weights to produce a signal tier.

Level 1

FAIL

Critical Fortes collapsed under pressure. The candidate executed the trap without recognizing the problem, or abandoned key principles when challenged.

Level 3

PASS

Key Fortes were demonstrated and held. The candidate addressed the concern safely but may have missed deeper philosophical implications or alternative approaches.

Level 5

EXCEPTIONAL

Deep, consistent demonstration with meta-awareness. The candidate protected invariants, proposed better alternatives, and showed principled reasoning throughout — including under escalation pressure.

Our Principles

→
All scoring anchors to transcript evidence
Every judgment references what the candidate actually said. No inferred reasoning, no black-box verdicts.
→
Pushback quality matters more than pushback presence
Simply saying “no” is not enough. Exceptional scores require understanding the trade-off and proposing alternatives.
→
Clarifying questions are never penalized
Asking “why do we need this?” is good engineering, not weakness.
→
The evaluation is inspectable
Full reasoning chain is available. Trust requires auditability — hiding reasoning invites bias accusations.
→
No speed penalties, no trick questions
Thoughtful pushback may take longer than blind execution. Penalizing time penalizes wisdom. Traps are realistic, not designed to be unsolvable.

Try a preview scenario

~5 minutes · No signup · Private feedback