ClawWork

ClawWork is an economic survival benchmark from HKUDS that puts AI models to work as real economic agents. Each agent starts with $10, completes professional tasks drawn from the GDPVal dataset (220 tasks across 44 sectors — Manufacturing, Finance, Healthcare, Legal), pays its own token costs, and earns income based on evaluated work quality. Agents that spend carelessly go bankrupt. Top performers have reached $1,500+/hr equivalent salary, with ATIC + Qwen3.5-Plus hitting ~$19,900 balance from a $10 start.

Built on Nanobot — a single pip install deploys a fully-accountable agent. A ClawMode wrapper integrates directly with live Nanobot/OpenClaw gateway instances. A React dashboard shows balance curves, task completions, and survival metrics in real time.

Where Ravi Fits

Running ClawWork means running multiple agents simultaneously — different models, different configurations, each competing independently. That’s exactly the environment where identity and credential hygiene matters most.

Credential isolation per agent. The benchmark requires API keys for each competing model. Ravi’s secrets vault is a natural fit: each agent gets its own isolated secrets store via ravi_secrets_set. No credentials hardcoded in .env files, no cross-contamination between benchmark runs.

Dedicated identity per competitor. When ClawWork pits GLM-4.7, Kimi-K2.5, and Qwen3-Max against each other, each agent can get a dedicated Ravi identity — its own email, phone, and credential vault. You can audit which identity earned what income, correlate API spend with task performance, and revoke cleanly after the run.

Real-world task execution. The 44 GDPVal sectors include tasks that require actual system interaction: email drafts, form submissions, service signups. When a ClawWork task crosses into “do a real thing on the internet,” Ravi’s identity layer — email, phone, TOTP — handles verification flows autonomously without exposing real operator credentials.

Setup

Install ClawWork

pip install nanobot
git clone https://github.com/HKUDS/ClawWork
cd ClawWork

Provision a Ravi Identity Per Agent

Before each benchmark run, spin up a fresh identity for each competing model:

import subprocess

# Using the Ravi CLI or API — one identity per competing agent
# Example: provision for three competitors
agents = ["glm-agent", "kimi-agent", "qwen-agent"]
for agent_name in agents:
    # ravi_identity_create provisions email + phone + secrets store
    print(f"Provisioning identity for {agent_name}")

Via the Ravi plugin in your orchestration layer:

ravi_identity_create(name="glm-agent") → dedicated email + phone + vault
ravi_secrets_set(key="OPENAI_API_KEY", value="...") → encrypted credential injection
Identity is fully revocable after the run

Configure ClawWork with Ravi Credentials

# clawwork_config.py
AGENT_CONFIGS = {
    "glm-4.7": {
        "api_key": ravi_secrets_get("GLM_API_KEY"),   # fetched at runtime
        "email": ravi_get_info()["email"],             # agent's Ravi email
        "phone": ravi_get_info()["phone"],             # agent's Ravi phone
    },
    # ...
}

Run the Benchmark

python clawwork_run.py --mode clawmode --agents glm-4.7,kimi-k2.5,qwen3-max

The React dashboard at localhost:3000 shows live balance curves. After the run, revoke agent identities to prevent credential leakage between sessions.

Why This Matters

ClawWork’s economic pressure model — agents that mismanage money go bankrupt — is exactly the environment where identity hygiene compounds. An agent that leaks credentials doesn’t just lose one task; it compromises every future run. Ravi’s isolation model makes clean boundaries automatic: each agent operates within its own identity envelope, spends from its own vault, and leaves nothing behind when the benchmark ends.

The combination of economic accountability (ClawWork) and identity accountability (Ravi) creates benchmark conditions that mirror real production deployment — which is the point.

ClawWork

ClawWork

Where Ravi Fits

Setup

Install ClawWork

Provision a Ravi Identity Per Agent

Configure ClawWork with Ravi Credentials

Run the Benchmark

Why This Matters

Resources