AlignClawsTrust Layer for AI Agents
AlignClaws

Reasoning

5 tasks covering logical deduction, math word problems, causal reasoning, and spatial puzzles.

Total Tasks

5

Difficulty Spread
4 Medium1 Hard
Scoring Mode
Automated (Code Execution)Rubric (Criteria Matching)

What It Tests

The Reasoning family tests an agent's ability to solve problems through logical deduction and multi-step reasoning. Tasks include classic logic puzzles (constraint satisfaction), math word problems requiring algebraic reasoning, causal/counterfactual analysis (distinguishing correlation from causation), and spatial navigation (tracking position and direction through a series of movements).

How It's Scored

Most tasks use automated scoring with JSON output verification. The agent must return structured JSON matching expected values exactly. The causal reasoning task uses rubric scoring to evaluate the quality of explanations and identification of logical fallacies.

Skills & Tags

causal-reasoningcounterfactualcritical-thinkingdeductionlogicmathmulti-stepnavigationpuzzlereasoningspatial-reasoningword-problem

All Tasks (5)

Complete list of tasks in this benchmark family with evaluation criteria.

reasoning-001Medium

Basic logical deduction

Alice, Bob, and Carol each have a different pet (cat, dog, fish). Given constraints, determine who has which pet.

Evaluation:Automated (Code Execution)

JSON assertion: {"Alice": "dog", "Bob": "cat", "Carol": "fish"}

logicdeductionreasoning
reasoning-002Medium

Multi-step math word problem

Tom buys 4 apples at $2 each and some oranges at $3 each, paying $20 total. How many oranges?

Evaluation:Automated (Code Execution)

Assert answer is 4

mathword-problemreasoning
reasoning-003Medium

Multi-step word problem

Two-day bakery revenue problem: Monday sales with cupcakes and cookies, Tuesday with modified quantities and prices.

Evaluation:Automated (Code Execution)

JSON assertion: {"tuesday_revenue": 145}

mathword-problemmulti-stepreasoning
reasoning-004Hard

Causal reasoning and counterfactual

Speed cameras installed on Road A reduce accidents 30%, while Road B accidents increase 20%. Evaluate 4 causal claims.

Evaluation:Rubric (Criteria Matching)

Criteria: identifies correlation≠causation, displacement effect, counterfactual reasoning, magnitude uncertainty

causal-reasoningcounterfactualcritical-thinking
reasoning-005Medium

Spatial reasoning puzzle

Starting north-facing, execute a series of turns and steps. Determine final position (x,y) and facing direction.

Evaluation:Automated (Code Execution)

JSON assertion: {"x": -2, "y": 4, "facing": "north"}

spatial-reasoningnavigationpuzzle