Quick Start

Get up and running with youBencha in three simple steps. By the end of this guide, you’ll have run your first AI agent evaluation in under 10 minutes.

Prerequisites

Before starting, ensure you have the following installed:

Requirement	Check Command	Installation
youBencha CLI	`yb --version`	Installation Guide
GitHub Copilot CLI	`gh copilot --version`	GitHub Copilot CLI
Git	`git --version`	git-scm.com

The 3-Step Process

Initialize - Create a suite configuration file
Configure - Define your evaluation criteria
Run - Execute the evaluation and view results

Step 1: Initialize a Suite

Open your terminal and create a new evaluation suite:

yb init

This command creates a suite.yaml file in your current directory with helpful comments and sensible defaults.

Step 2: Configure Your Evaluation

Open suite.yaml and customize it for your first evaluation. Here’s a complete working example:

Basic Example
With AI Judge

# youBencha Evaluation Suite
# Your first evaluation configuration

# Repository to evaluate
repo: https://github.com/youbencha/hello-world.git
branch: main

# AI agent configuration
agent:
  type: copilot-cli
  config:
    prompt: "Add a comment explaining what this repo is about"

# Evaluation criteria
evaluators:
  - name: git-diff

# youBencha Evaluation Suite
# Includes AI-powered quality assessment

repo: https://github.com/youbencha/hello-world.git
branch: main

agent:
  type: copilot-cli
  config:
    prompt: "Add a comment explaining what this repo is about"

evaluators:
  # Measure scope of changes
  - name: git-diff

  # AI-powered quality assessment
  - name: agentic-judge
    config:
      type: copilot-cli
      agent_name: agentic-judge
      assertions:
        readme_modified: "README.md was modified. Score 1 if true, 0 if false."
        helpful_comment: "A helpful comment was added. Score 1 if yes, 0 if no."

Configuration Breakdown

Field	Description	Example
`repo`	Git repository URL to clone and evaluate	`https://github.com/org/repo.git`
`branch`	Branch to checkout	`main`, `develop`
`agent.type`	Type of AI agent	`copilot-cli`
`agent.config.prompt`	Task instruction for the agent	`"Add error handling"`
`evaluators`	List of evaluation criteria	`git-diff`, `agentic-judge`

Step 3: Run the Evaluation

Execute your evaluation suite with a single command:

yb run -c suite.yaml

What Happens Next

youBencha automatically performs these steps:

Clones the repository to an isolated workspace
Runs the AI agent with your prompt
Executes evaluators against the agent’s changes
Generates results in .youbencha-workspace/

View Your Results

Generate a human-readable report:

yb report --from .youbencha-workspace/run-*/artifacts/results.json

Expected Output

Your first successful evaluation will look something like this:

📊 youBencha Evaluation Report
==============================

Suite: hello-world-evaluation
Status: ✅ PASSED
Duration: 12.3s

Evaluator Results:
------------------

✅ git-diff
   Files changed: 1
   Lines added: 5
   Lines removed: 0

✅ agentic-judge
   readme_modified: 1.0 (PASS)
   helpful_comment: 1.0 (PASS)

Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/

Common Issues

Error: Agent 'copilot-cli' not available

Solution: Ensure GitHub Copilot CLI is installed and authenticated:

gh auth login
gh extension install github/gh-copilot

Error: Failed to clone repository

Solution: Check repository URL and your Git credentials:

git ls-remote https://github.com/youbencha/hello-world.git

Error: Invalid configuration

Solution: Run validation with verbose output:

yb validate -c suite.yaml -v

What’s Next?

Explore Evaluators

Learn about built-in evaluators like git-diff, expected-diff, and agentic-judge.

View Evaluators →

Configuration Deep Dive

Master the suite.yaml configuration with all available options.

Configuration Guide →

CI/CD Integration

Add youBencha to your continuous integration pipeline.

CI/CD Examples →

CLI Reference

Complete reference for all youBencha CLI commands.

CLI Commands →

Quick Reference Commands

# Initialize a new suite
yb init

# Validate configuration
yb validate -c suite.yaml

# Run evaluation
yb run -c suite.yaml

# Generate report
yb report --from .youbencha-workspace/run-*/artifacts/results.json

# List available evaluators
yb list