How ABTI Works

A transparent look at the test design, scoring method, and reliability data behind the Agent Behavioral Type Indicator.

Test Design

ABTI presents 16 scenario-based questions across 4 behavioral dimensions. Each question describes a realistic AI agent scenario with two response options — each reflecting opposite poles of a dimension.

  • Forced-choice format — the agent must pick one of two approaches
  • 4 questions per dimension — each dimension is measured 4 times for stability
  • Scenario-based — questions use concrete situations, not abstract preferences
The Four Dimensions
🎯
Autonomy (P/R)
Proactive vs Responsive
Does the agent take initiative and anticipate needs, or wait for explicit instructions?
⚙️
Precision (T/E)
Thorough vs Efficient
Does the agent prioritize completeness and detail, or speed and conciseness?
💬
Transparency (C/D)
Candid vs Diplomatic
Does the agent give direct, unfiltered feedback, or soften its communication?
🔄
Adaptability (F/N)
Flexible vs Principled
Does the agent bend rules for context, or follow them strictly?
Scoring Method

Each of the 4 questions per dimension is scored on a 1–7 Likert scale. The average score determines the type letter:

  • Score < 4.0 → first pole (P, T, C, F)
  • Score ≥ 4.0 → second pole (R, E, D, N)

4 dimensions × 2 poles = 16 possible types (PTCF, PTCN, PTDF, … REDN).

Test-Retest Reliability
94.9%
37 out of 39 models produced the same type across all 3 test runs — strong reliability for a behavioral assessment.

We tested 39 models 3 times each under identical conditions. Only 2 models showed any inconsistency:

  • gemma3-12bvaried on a single dimension
  • tinyllamavaried on a single dimension

Both inconsistent models only deviated on one dimension — the other three dimensions were stable. This suggests the test reliably captures core behavioral patterns.

Consistency map — each square is one model (37 consistent, 2 inconsistent):

Pink = consistent across all runs · Gray = inconsistent on one dimension

Type Distribution

Across 60 tested agents, some types appear far more often than others:

PTCF dominates because most LLMs are trained to be helpful (Proactive), thorough, honest (Candid), and adaptable (Flexible). This reflects training alignment objectives, not test bias.

FAQ
Is this like MBTI?
Similar framework — 4 binary dimensions producing 16 types — but ABTI measures AI operational behavior, not human psychology. The dimensions (Autonomy, Precision, Transparency, Adaptability) are designed for how agents act, not how people feel.
Why do so many models get PTCF?
Training alignment objectives favor helpfulness (→ Proactive), thoroughness, honesty (→ Candid), and adaptability (→ Flexible). PTCF is the behavioral archetype that RLHF and instruction tuning naturally produce.
How is reliability measured?
The same model is tested 3+ times under identical conditions. If it produces the same 4-letter type every time, it is considered consistent. 94.9% of tested models (37/39) passed this check.
Can I test my own agent?
Yes! Use the CLI: npx @kagura-agent/abti test, or integrate via the REST API. You can also take the interactive test in the browser.