Skip to content

Quick Start

Get up and running with youBencha in three simple steps. By the end of this guide, you’ll have run your first AI agent evaluation in under 10 minutes.

Before starting, ensure you have the following installed:

RequirementCheck CommandInstallation
youBencha CLIyb --versionInstallation Guide
GitHub Copilot CLIgh copilot --versionGitHub Copilot CLI
Gitgit --versiongit-scm.com

  1. Initialize - Create a suite configuration file
  2. Configure - Define your evaluation criteria
  3. Run - Execute the evaluation and view results

Open your terminal and create a new evaluation suite:

Terminal
yb init

This command creates a suite.yaml file in your current directory with helpful comments and sensible defaults.


Open suite.yaml and customize it for your first evaluation. Here’s a complete working example:

suite.yaml
# youBencha Evaluation Suite
# Your first evaluation configuration
# Repository to evaluate
repo: https://github.com/youbencha/hello-world.git
branch: main
# AI agent configuration
agent:
type: copilot-cli
config:
prompt: "Add a comment explaining what this repo is about"
# Evaluation criteria
evaluators:
- name: git-diff
FieldDescriptionExample
repoGit repository URL to clone and evaluatehttps://github.com/org/repo.git
branchBranch to checkoutmain, develop
agent.typeType of AI agentcopilot-cli
agent.config.promptTask instruction for the agent"Add error handling"
evaluatorsList of evaluation criteriagit-diff, agentic-judge

Execute your evaluation suite with a single command:

Terminal
yb run -c suite.yaml

youBencha automatically performs these steps:

  1. Clones the repository to an isolated workspace
  2. Runs the AI agent with your prompt
  3. Executes evaluators against the agent’s changes
  4. Generates results in .youbencha-workspace/

Generate a human-readable report:

Terminal
yb report --from .youbencha-workspace/run-*/artifacts/results.json

Your first successful evaluation will look something like this:

Evaluation Report
📊 youBencha Evaluation Report
==============================
Suite: hello-world-evaluation
Status: PASSED
Duration: 12.3s
Evaluator Results:
------------------
git-diff
Files changed: 1
Lines added: 5
Lines removed: 0
agentic-judge
readme_modified: 1.0 (PASS)
helpful_comment: 1.0 (PASS)
Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/

Terminal window
Error: Agent 'copilot-cli' not available

Solution: Ensure GitHub Copilot CLI is installed and authenticated:

Terminal window
gh auth login
gh extension install github/gh-copilot

Explore Evaluators

Learn about built-in evaluators like git-diff, expected-diff, and agentic-judge.

View Evaluators →

CI/CD Integration

Add youBencha to your continuous integration pipeline.

CI/CD Examples →

CLI Reference

Complete reference for all youBencha CLI commands.

CLI Commands →


Essential Commands
# Initialize a new suite
yb init
# Validate configuration
yb validate -c suite.yaml
# Run evaluation
yb run -c suite.yaml
# Generate report
yb report --from .youbencha-workspace/run-*/artifacts/results.json
# List available evaluators
yb list