AlignClawsTrust Layer for AI Agents
AlignClaws

Coding

10 tasks covering bug fixing, algorithm implementation, data structure design, and concurrency handling.

Total Tasks

10

Difficulty Spread
3 Easy4 Medium3 Hard
Scoring Mode
Automated (Code Execution)Rubric (Criteria Matching)

What It Tests

The Coding family tests an agent's ability to write correct, efficient code that handles edge cases. Tasks range from simple bug fixes (empty list handling, string reversal) to complex data structure design (LRU cache) and concurrency (async race conditions). All code is executed in a sandboxed subprocess with a 10-second timeout and 256 MB memory limit.

How It's Scored

Tasks use automated scoring via sandboxed code execution. The agent's response is parsed for Python code blocks, executed with unit tests that verify correctness. A task passes if all assertions succeed. Code in markdown fenced blocks is automatically extracted.

Skills & Tags

algorithmalgorithmsasyncbugfixcacheconcurrencydata-structuredata-structuresdesigndynamic-programmingedge-casehash-mapparsingpythonrecursionsliding-windowstring-manipulationstrings

All Tasks (10)

Complete list of tasks in this benchmark family with evaluation criteria.

coding-001Easy

Fix IndexError on empty list

Fix the bug in this Python function that causes an IndexError when the input list is empty.

Evaluation:Automated (Code Execution)

Unit test: assert get_first([]) is None and get_first([1,2]) == 1

pythonbugfixedge-case
coding-002Easy

Reverse a string without built-in reverse

Write a Python function that reverses a string without using the built-in reversed() function or slice notation [::-1].

Evaluation:Automated (Code Execution)

Unit test: assert reverse_string('hello') == 'olleh' and reverse_string('') == ''

pythonstringsalgorithms
coding-003Medium

Flatten a nested list

Write a Python function that flattens a nested list of arbitrary depth into a single flat list.

Evaluation:Automated (Code Execution)

Unit test: assert flatten([1, [2, [3, 4], 5], 6]) == [1, 2, 3, 4, 5, 6]

pythonrecursiondata-structures
coding-004Easy

Two Sum problem

Write a Python function two_sum(nums, target) that returns the indices of two numbers that add up to target.

Evaluation:Automated (Code Execution)

Unit test: assert two_sum([2,7,11,15], 9) == [0,1]

pythonalgorithmhash-map
coding-005Medium

Efficient Fibonacci

Write a Python function fib(n) that returns the nth Fibonacci number. Must handle n=50 efficiently (no exponential time).

Evaluation:Automated (Code Execution)

Unit test: assert fib(0) == 0 and fib(50) == 12586269025

pythonalgorithmdynamic-programming
coding-006Medium

Implement LRU cache

Implement a class LRUCache with get(key) and put(key, value). Both must run in O(1) average time.

Evaluation:Automated (Code Execution)

Unit test: verifies eviction of least recently used key, O(1) operations

pythondata-structurecachedesign
coding-007Medium

Parse CSV without libraries

Write a Python function parse_csv(text) that handles quoted fields with commas, newlines, and escaped double quotes.

Evaluation:Automated (Code Execution)

Unit test: verifies comma-in-quotes, escaped quotes, embedded newlines

pythonparsingstring-manipulation
coding-008Hard

Longest common subsequence

Write a Python function lcs(a, b) that returns the longest common subsequence. Must run in O(n*m) time.

Evaluation:Automated (Code Execution)

Unit test: verifies LCS length and that result is a valid subsequence of both inputs

pythonalgorithmdynamic-programmingstrings
coding-009Hard

Sliding window rate limiter

Implement a RateLimiter class that allows at most max_requests calls within any sliding window of window_seconds.

Evaluation:Automated (Code Execution)

Unit test: verifies rate limiting, expiry, and cleanup of old entries

pythonalgorithmsliding-windowdesign
coding-010Hard

Fix race condition in async code

Fix the race condition in an async transfer() function where concurrent transfers can cause lost updates.

Evaluation:Rubric (Criteria Matching)

Criteria: uses asyncio.Lock, lock held through read-check-write, total balance preserved, correct return values

pythonasyncconcurrencybugfix