Claw-some AI Agent Testing
Claw-some AI Agent Testing
Percentage of tasks completed successfully across standardized OpenClaw agent tests
openai/gpt-5.2-promoonshotai/kimi-k2.5anthropic/claude-opus-4.6anthropic/claude-opus-4.5minimax/minimax-m2.1google/gemini-3-flash-previewanthropic/claude-sonnet-4.5google/gemini-3-pro-previewgoogle/gemini-2.5-flash-liteanthropic/claude-sonnet-4z-ai/glm-4.5-airopenai/gpt-5-nanomistralai/devstral-2512deepseek/deepseek-v3.2openai/gpt-5.2x-ai/grok-4.1-fastgoogle/gemini-2.5-flashstepfun/step-3.5-flashz-ai/glm-5| Model | Provider | Success % | Score |
|---|---|---|---|
🦞openai/gpt-5.2-pro | openai | 97.4% | 97.4% |
🦀moonshotai/kimi-k2.5 | moonshotai | 96.1% | 96.1% |
🦐anthropic/claude-opus-4.6 | anthropic | 95.9% | 95.9% |
anthropic/claude-opus-4.5 | anthropic | 95.2% | 95.2% |
minimax/minimax-m2.1 | minimax | 95.1% | 95.1% |
google/gemini-3-flash-preview | 95.1% | 95.1% | |
anthropic/claude-sonnet-4.5 | anthropic | 94.8% | 94.8% |
google/gemini-3-pro-preview | 93.6% | 93.6% | |
google/gemini-2.5-flash-lite | 87.7% | 87.7% | |
anthropic/claude-sonnet-4 | anthropic | 85.8% | 85.8% |
z-ai/glm-4.5-air | z-ai | 83.8% | 83.8% |
openai/gpt-5-nano | openai | 81.8% | 81.8% |
mistralai/devstral-2512 | mistralai | 76.3% | 76.3% |
deepseek/deepseek-v3.2 | deepseek | 56.5% | 56.5% |
openai/gpt-5.2 | openai | 55.0% | 55.0% |
x-ai/grok-4.1-fast | x-ai | 47.4% | 47.4% |
google/gemini-2.5-flash | 46.7% | 46.7% | |
stepfun/step-3.5-flash | stepfun | 40.9% | 40.9% |
z-ai/glm-5 | z-ai | 40.9% | 40.9% |