yb run
Execute an evaluation suite against an AI agent and generate results.
Synopsis
Section titled “Synopsis”yb run -c <config-file> [options]Description
Section titled “Description”The run command executes a complete evaluation pipeline. It orchestrates the entire process from cloning the repository to generating final results.
Execution Pipeline
Section titled “Execution Pipeline”- Validates configuration - Checks syntax and schema
- Creates isolated workspace - Sets up
.youbencha-workspace/run-{timestamp}-{hash}/ - Clones repository - Fetches code to evaluate
- Runs pre-execution hooks - Optional setup scripts (if configured)
- Executes agent - Runs the AI agent with your prompt
- Runs evaluators - Evaluates results in parallel
- Runs post-evaluation hooks - Optional result processing (if configured)
- Saves results - Writes artifacts to workspace
Options
Section titled “Options”| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--config | -c | string | Required | Path to suite configuration (YAML or JSON) |
--delete-workspace | -d | flag | false | Delete workspace after completion |
--timeout | -t | number | 300 | Maximum execution time in seconds |
--verbose | -v | flag | false | Show detailed execution logs |
--dry-run | flag | false | Validate and show what would run without executing | |
--workspace-dir | -w | string | .youbencha-workspace | Custom workspace directory |
--help | -h | flag | - | Show help message |
Examples
Section titled “Examples”Basic Usage
Section titled “Basic Usage”Run an evaluation with the default settings:
yb run -c suite.yamlWith Workspace Cleanup
Section titled “With Workspace Cleanup”Delete the workspace after successful completion:
yb run -c suite.yaml --delete-workspaceVerbose Output
Section titled “Verbose Output”See detailed execution logs:
yb run -c suite.yaml --verboseCustom Timeout
Section titled “Custom Timeout”Set a longer timeout for complex evaluations:
yb run -c suite.yaml --timeout 600Dry Run
Section titled “Dry Run”See what would happen without executing:
yb run -c suite.yaml --dry-runCustom Workspace Directory
Section titled “Custom Workspace Directory”Use a specific workspace location:
yb run -c suite.yaml --workspace-dir ./my-workspaceComplete Workflow Example
Section titled “Complete Workflow Example”# Validate firstyb validate -c suite.yaml
# Run evaluationyb run -c suite.yaml
# Generate reportyb report --from .youbencha-workspace/run-*/artifacts/results.jsonOutput Structure
Section titled “Output Structure”After running, your workspace will contain:
.youbencha-workspace/└── run-{timestamp}-{hash}/ ├── src-modified/ # Code after agent execution ├── src-expected/ # Reference code (if configured) ├── artifacts/ │ ├── results.json # Machine-readable results │ ├── report.md # Human-readable report │ ├── youbencha.log.json # Agent execution log │ ├── git-diff.patch # Git diff output │ └── expected-diff.json # File-by-file similarity analysis └── .youbencha.lock # Workspace metadataArtifact Details
Section titled “Artifact Details”| File | Format | Description |
|---|---|---|
results.json | JSON | Machine-readable evaluation results |
report.md | Markdown | Human-readable summary report |
youbencha.log.json | JSON | Complete agent execution log |
git-diff.patch | Patch | Git-format diff of all changes |
expected-diff.json | JSON | File-by-file similarity analysis |
Exit Codes
Section titled “Exit Codes”| Code | Meaning | Action |
|---|---|---|
0 | All evaluators passed | ✅ Success |
1 | One or more evaluators failed | Review results |
2 | Configuration error | Fix configuration |
3 | Agent execution error | Check agent setup |
4 | Timeout exceeded | Increase timeout or optimize |
5 | Repository clone failed | Check URL and credentials |
Execution Examples
Section titled “Execution Examples”$ yb run -c suite.yaml
🚀 Starting evaluation...✅ Configuration validated📁 Created workspace: .youbencha-workspace/run-2024-11-15-143022-abc123/📥 Cloning https://github.com/youbencha/hello-world.git...🤖 Running agent (copilot-cli)...📊 Running evaluators... ✅ git-diff ✅ agentic-judge💾 Results saved to artifacts/results.json
✅ Evaluation complete! All 2 evaluators passed.$ yb run -c suite.yaml
🚀 Starting evaluation...✅ Configuration validated📁 Created workspace: .youbencha-workspace/run-2024-11-15-143522-def456/📥 Cloning repository...🤖 Running agent (copilot-cli)...📊 Running evaluators... ✅ git-diff ❌ agentic-judge - code_quality: 0.4 (FAIL, threshold: 0.7)💾 Results saved to artifacts/results.json
❌ Evaluation failed. 1 of 2 evaluators failed.$ yb run -c suite.yaml --verbose
[DEBUG] Loading configuration from suite.yaml[DEBUG] Parsed configuration: {...}[INFO] Validating configuration...[DEBUG] Checking repository URL: https://github.com/...[DEBUG] Checking agent type: copilot-cli[INFO] ✅ Configuration validated[DEBUG] Creating workspace directory...[INFO] 📁 Created workspace: .youbencha-workspace/run-...[DEBUG] Git clone command: git clone --depth 1 ...[INFO] 📥 Cloning repository......Troubleshooting
Section titled “Troubleshooting”If you see validation errors, run validation with verbose mode:
yb validate -c suite.yaml -vCommon issues include invalid YAML syntax, missing required fields, or unsupported agent types.
Ensure your agent is installed and authenticated:
# For Copilot CLIgh auth statusgh copilot --versionIf you need to clean up workspaces:
# Remove all workspacesrm -rf .youbencha-workspace/
# Or remove specific runrm -rf .youbencha-workspace/run-2024-11-15-*/For long-running evaluations:
# Increase timeout to 10 minutesyb run -c suite.yaml --timeout 600Related Commands
Section titled “Related Commands”- yb validate - Validate configuration before running
- yb report - Generate reports from results
- Configuration Reference - Complete configuration options