Explore Evaluators
Learn about built-in evaluators like git-diff, expected-diff, and agentic-judge.
Get up and running with youBencha in three simple steps. By the end of this guide, you’ll have run your first AI agent evaluation in under 10 minutes.
Before starting, ensure you have the following installed:
| Requirement | Check Command | Installation |
|---|---|---|
| youBencha CLI | yb --version | Installation Guide |
| GitHub Copilot CLI | gh copilot --version | GitHub Copilot CLI |
| Git | git --version | git-scm.com |
Open your terminal and create a new evaluation suite:
yb initThis command creates a suite.yaml file in your current directory with helpful comments and sensible defaults.
Open suite.yaml and customize it for your first evaluation. Here’s a complete working example:
# youBencha Evaluation Suite# Your first evaluation configuration
# Repository to evaluaterepo: https://github.com/youbencha/hello-world.gitbranch: main
# AI agent configurationagent: type: copilot-cli config: prompt: "Add a comment explaining what this repo is about"
# Evaluation criteriaevaluators: - name: git-diff# youBencha Evaluation Suite# Includes AI-powered quality assessment
repo: https://github.com/youbencha/hello-world.gitbranch: main
agent: type: copilot-cli config: prompt: "Add a comment explaining what this repo is about"
evaluators: # Measure scope of changes - name: git-diff
# AI-powered quality assessment - name: agentic-judge config: type: copilot-cli agent_name: agentic-judge assertions: readme_modified: "README.md was modified. Score 1 if true, 0 if false." helpful_comment: "A helpful comment was added. Score 1 if yes, 0 if no."| Field | Description | Example |
|---|---|---|
repo | Git repository URL to clone and evaluate | https://github.com/org/repo.git |
branch | Branch to checkout | main, develop |
agent.type | Type of AI agent | copilot-cli |
agent.config.prompt | Task instruction for the agent | "Add error handling" |
evaluators | List of evaluation criteria | git-diff, agentic-judge |
Execute your evaluation suite with a single command:
yb run -c suite.yamlyouBencha automatically performs these steps:
.youbencha-workspace/Generate a human-readable report:
yb report --from .youbencha-workspace/run-*/artifacts/results.jsonYour first successful evaluation will look something like this:
📊 youBencha Evaluation Report==============================
Suite: hello-world-evaluationStatus: ✅ PASSEDDuration: 12.3s
Evaluator Results:------------------
✅ git-diff Files changed: 1 Lines added: 5 Lines removed: 0
✅ agentic-judge readme_modified: 1.0 (PASS) helpful_comment: 1.0 (PASS)
Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/Error: Agent 'copilot-cli' not availableSolution: Ensure GitHub Copilot CLI is installed and authenticated:
gh auth logingh extension install github/gh-copilotError: Failed to clone repositorySolution: Check repository URL and your Git credentials:
git ls-remote https://github.com/youbencha/hello-world.gitError: Invalid configurationSolution: Run validation with verbose output:
yb validate -c suite.yaml -vExplore Evaluators
Learn about built-in evaluators like git-diff, expected-diff, and agentic-judge.
Configuration Deep Dive
Master the suite.yaml configuration with all available options.
CI/CD Integration
Add youBencha to your continuous integration pipeline.
CLI Reference
Complete reference for all youBencha CLI commands.
# Initialize a new suiteyb init
# Validate configurationyb validate -c suite.yaml
# Run evaluationyb run -c suite.yaml
# Generate reportyb report --from .youbencha-workspace/run-*/artifacts/results.json
# List available evaluatorsyb list