π Quick Start
Run your first evaluation in 10 minutes! Step-by-step guide with copy-paste examples.
youBencha is an open-source framework for benchmarking AI coding agents. It provides a structured, reproducible way to evaluate how well AI agents perform real-world coding tasks.
π Quick Start
Run your first evaluation in 10 minutes! Step-by-step guide with copy-paste examples.
Getting Started
New to youBencha? Learn about installation and core concepts.
CLI Reference
Complete reference for all youBencha CLI commands including yb run, yb report, and more.
Configuration
Learn how to configure evaluation suites with YAML files and prompt templates.
Evaluators
Discover built-in evaluators like git-diff, expected-diff, and agentic-judge.
Adapters
Connect youBencha to different AI agents with adapters like Copilot CLI.
Hooks
Extend youBencha with pre-execution and post-evaluation hooks.
Examples
Real-world examples including CI/CD integration and Slack notifications.
Advanced Topics
Model selection, workspace management, and advanced reporting.
Best Practices
Tips and recommendations for effective AI agent benchmarking.
Troubleshooting
Common issues and solutions when running youBencha evaluations.