ClawWork
ClawWork
ClawWork is an economic survival benchmark from HKUDS that puts AI models to work as real economic agents. Each agent starts with $10, completes professional tasks drawn from the GDPVal dataset (220 tasks across 44 sectors — Manufacturing, Finance, Healthcare, Legal), pays its own token costs, and earns income based on evaluated work quality. Agents that spend carelessly go bankrupt. Top performers have reached $1,500+/hr equivalent salary, with ATIC + Qwen3.5-Plus hitting ~$19,900 balance from a $10 start.
Built on Nanobot — a single pip install deploys a fully-accountable agent. A ClawMode wrapper integrates directly with live Nanobot/OpenClaw gateway instances. A React dashboard shows balance curves, task completions, and survival metrics in real time.
Where Ravi Fits
Running ClawWork means running multiple agents simultaneously — different models, different configurations, each competing independently. That’s exactly the environment where identity and credential hygiene matters most.
Credential isolation per agent. The benchmark requires API keys for each competing model. Ravi’s secrets vault is a natural fit: each agent gets its own isolated secrets store via ravi_secrets_set. No credentials hardcoded in .env files, no cross-contamination between benchmark runs.
Dedicated identity per competitor. When ClawWork pits GLM-4.7, Kimi-K2.5, and Qwen3-Max against each other, each agent can get a dedicated Ravi identity — its own email, phone, and credential vault. You can audit which identity earned what income, correlate API spend with task performance, and revoke cleanly after the run.
Real-world task execution. The 44 GDPVal sectors include tasks that require actual system interaction: email drafts, form submissions, service signups. When a ClawWork task crosses into “do a real thing on the internet,” Ravi’s identity layer — email, phone, TOTP — handles verification flows autonomously without exposing real operator credentials.
Setup
Install ClawWork
pip install nanobot
git clone https://github.com/HKUDS/ClawWork
cd ClawWork
Provision a Ravi Identity Per Agent
Before each benchmark run, spin up a fresh identity for each competing model:
import subprocess
# Using the Ravi CLI or API — one identity per competing agent
# Example: provision for three competitors
agents = ["glm-agent", "kimi-agent", "qwen-agent"]
for agent_name in agents:
# ravi_identity_create provisions email + phone + secrets store
print(f"Provisioning identity for {agent_name}")
Via the Ravi plugin in your orchestration layer:
ravi_identity_create(name="glm-agent")→ dedicated email + phone + vaultravi_secrets_set(key="OPENAI_API_KEY", value="...")→ encrypted credential injection- Identity is fully revocable after the run
Configure ClawWork with Ravi Credentials
# clawwork_config.py
AGENT_CONFIGS = {
"glm-4.7": {
"api_key": ravi_secrets_get("GLM_API_KEY"), # fetched at runtime
"email": ravi_get_info()["email"], # agent's Ravi email
"phone": ravi_get_info()["phone"], # agent's Ravi phone
},
# ...
}
Run the Benchmark
python clawwork_run.py --mode clawmode --agents glm-4.7,kimi-k2.5,qwen3-max
The React dashboard at localhost:3000 shows live balance curves. After the run, revoke agent identities to prevent credential leakage between sessions.
Why This Matters
ClawWork’s economic pressure model — agents that mismanage money go bankrupt — is exactly the environment where identity hygiene compounds. An agent that leaks credentials doesn’t just lose one task; it compromises every future run. Ravi’s isolation model makes clean boundaries automatic: each agent operates within its own identity envelope, spends from its own vault, and leaves nothing behind when the benchmark ends.
The combination of economic accountability (ClawWork) and identity accountability (Ravi) creates benchmark conditions that mirror real production deployment — which is the point.