youBencha Documentation

Benchmark AI Coding Agents with Confidence

What is youBencha?

youBencha is an open-source framework for benchmarking AI coding agents. It provides a structured, reproducible way to evaluate how well AI agents perform real-world coding tasks.

🚀 Quick Start

Run your first evaluation in 10 minutes! Step-by-step guide with copy-paste examples.

Start the Quick Start →

Getting Started

New to youBencha? Learn about installation and core concepts.

Installation Guide →

CLI Reference

Complete reference for all youBencha CLI commands including yb run, yb report, and more.

View Commands →

Configuration

Learn how to configure evaluation suites with YAML files and prompt templates.

Configuration Guide →

Evaluators

Discover built-in evaluators like git-diff, expected-diff, and agentic-judge.

Explore Evaluators →

Adapters

Connect youBencha to different AI agents with adapters like Copilot CLI.

View Adapters →

Hooks

Extend youBencha with pre-execution and post-evaluation hooks.

Learn about Hooks →

Quick Links

Examples

Real-world examples including CI/CD integration and Slack notifications.

View Examples →

Advanced Topics

Model selection, workspace management, and advanced reporting.

Advanced Guide →

Best Practices

Tips and recommendations for effective AI agent benchmarking.

Best Practices →

Troubleshooting

Common issues and solutions when running youBencha evaluations.

Get Help →