yb run

Execute an evaluation suite against an AI agent and generate results.

Synopsis

yb run -c <config-file> [options]

Description

The run command executes a complete evaluation pipeline. It orchestrates the entire process from cloning the repository to generating final results.

Execution Pipeline

Validates configuration - Checks syntax and schema
Creates isolated workspace - Sets up .youbencha-workspace/run-{timestamp}-{hash}/
Clones repository - Fetches code to evaluate
Runs pre-execution hooks - Optional setup scripts (if configured)
Executes agent - Runs the AI agent with your prompt
Runs evaluators - Evaluates results in parallel
Runs post-evaluation hooks - Optional result processing (if configured)
Saves results - Writes artifacts to workspace

Options

Option	Short	Type	Default	Description
`--config`	`-c`	string	Required	Path to suite configuration (YAML or JSON)
`--delete-workspace`	`-d`	flag	`false`	Delete workspace after completion
`--timeout`	`-t`	number	`300`	Maximum execution time in seconds
`--verbose`	`-v`	flag	`false`	Show detailed execution logs
`--dry-run`		flag	`false`	Validate and show what would run without executing
`--workspace-dir`	`-w`	string	`.youbencha-workspace`	Custom workspace directory
`--help`	`-h`	flag	-	Show help message

Examples

Basic Usage

Run an evaluation with the default settings:

yb run -c suite.yaml

With Workspace Cleanup

Delete the workspace after successful completion:

yb run -c suite.yaml --delete-workspace

Verbose Output

See detailed execution logs:

yb run -c suite.yaml --verbose

Custom Timeout

Set a longer timeout for complex evaluations:

yb run -c suite.yaml --timeout 600

Dry Run

See what would happen without executing:

yb run -c suite.yaml --dry-run

Custom Workspace Directory

Use a specific workspace location:

yb run -c suite.yaml --workspace-dir ./my-workspace

Complete Workflow Example

# Validate first
yb validate -c suite.yaml

# Run evaluation
yb run -c suite.yaml

# Generate report
yb report --from .youbencha-workspace/run-*/artifacts/results.json

Output Structure

After running, your workspace will contain:

.youbencha-workspace/
└── run-{timestamp}-{hash}/
    ├── src-modified/              # Code after agent execution
    ├── src-expected/              # Reference code (if configured)
    ├── artifacts/
    │   ├── results.json           # Machine-readable results
    │   ├── report.md              # Human-readable report
    │   ├── youbencha.log.json     # Agent execution log
    │   ├── git-diff.patch         # Git diff output
    │   └── expected-diff.json     # File-by-file similarity analysis
    └── .youbencha.lock            # Workspace metadata

Artifact Details

File	Format	Description
`results.json`	JSON	Machine-readable evaluation results
`report.md`	Markdown	Human-readable summary report
`youbencha.log.json`	JSON	Complete agent execution log
`git-diff.patch`	Patch	Git-format diff of all changes
`expected-diff.json`	JSON	File-by-file similarity analysis

Exit Codes

Code	Meaning	Action
`0`	All evaluators passed	✅ Success
`1`	One or more evaluators failed	Review results
`2`	Configuration error	Fix configuration
`3`	Agent execution error	Check agent setup
`4`	Timeout exceeded	Increase timeout or optimize
`5`	Repository clone failed	Check URL and credentials

Execution Examples

$ yb run -c suite.yaml

🚀 Starting evaluation...
✅ Configuration validated
📁 Created workspace: .youbencha-workspace/run-2024-11-15-143022-abc123/
📥 Cloning https://github.com/youbencha/hello-world.git...
🤖 Running agent (copilot-cli)...
📊 Running evaluators...
  ✅ git-diff
  ✅ agentic-judge
💾 Results saved to artifacts/results.json

✅ Evaluation complete! All 2 evaluators passed.

$ yb run -c suite.yaml

🚀 Starting evaluation...
✅ Configuration validated
📁 Created workspace: .youbencha-workspace/run-2024-11-15-143522-def456/
📥 Cloning repository...
🤖 Running agent (copilot-cli)...
📊 Running evaluators...
  ✅ git-diff
  ❌ agentic-judge
     - code_quality: 0.4 (FAIL, threshold: 0.7)
💾 Results saved to artifacts/results.json

❌ Evaluation failed. 1 of 2 evaluators failed.

$ yb run -c suite.yaml --verbose

[DEBUG] Loading configuration from suite.yaml
[DEBUG] Parsed configuration: {...}
[INFO] Validating configuration...
[DEBUG] Checking repository URL: https://github.com/...
[DEBUG] Checking agent type: copilot-cli
[INFO] ✅ Configuration validated
[DEBUG] Creating workspace directory...
[INFO] 📁 Created workspace: .youbencha-workspace/run-...
[DEBUG] Git clone command: git clone --depth 1 ...
[INFO] 📥 Cloning repository...
...

Troubleshooting

If you see validation errors, run validation with verbose mode:

yb validate -c suite.yaml -v

Common issues include invalid YAML syntax, missing required fields, or unsupported agent types.

Ensure your agent is installed and authenticated:

# For Copilot CLI
gh auth status
gh copilot --version

If you need to clean up workspaces:

# Remove all workspaces
rm -rf .youbencha-workspace/

# Or remove specific run
rm -rf .youbencha-workspace/run-2024-11-15-*/

For long-running evaluations:

# Increase timeout to 10 minutes
yb run -c suite.yaml --timeout 600

yb validate - Validate configuration before running
yb report - Generate reports from results
Configuration Reference - Complete configuration options