Skip to content

yb run

Execute an evaluation suite against an AI agent and generate results.

Terminal window
yb run -c <config-file> [options]

The run command executes a complete evaluation pipeline. It orchestrates the entire process from cloning the repository to generating final results.

  1. Validates configuration - Checks syntax and schema
  2. Creates isolated workspace - Sets up .youbencha-workspace/run-{timestamp}-{hash}/
  3. Clones repository - Fetches code to evaluate
  4. Runs pre-execution hooks - Optional setup scripts (if configured)
  5. Executes agent - Runs the AI agent with your prompt
  6. Runs evaluators - Evaluates results in parallel
  7. Runs post-evaluation hooks - Optional result processing (if configured)
  8. Saves results - Writes artifacts to workspace
OptionShortTypeDefaultDescription
--config-cstringRequiredPath to suite configuration (YAML or JSON)
--delete-workspace-dflagfalseDelete workspace after completion
--timeout-tnumber300Maximum execution time in seconds
--verbose-vflagfalseShow detailed execution logs
--dry-runflagfalseValidate and show what would run without executing
--workspace-dir-wstring.youbencha-workspaceCustom workspace directory
--help-hflag-Show help message

Run an evaluation with the default settings:

Terminal
yb run -c suite.yaml

Delete the workspace after successful completion:

Terminal
yb run -c suite.yaml --delete-workspace

See detailed execution logs:

Terminal
yb run -c suite.yaml --verbose

Set a longer timeout for complex evaluations:

Terminal
yb run -c suite.yaml --timeout 600

See what would happen without executing:

Terminal
yb run -c suite.yaml --dry-run

Use a specific workspace location:

Terminal
yb run -c suite.yaml --workspace-dir ./my-workspace
Terminal
# Validate first
yb validate -c suite.yaml
# Run evaluation
yb run -c suite.yaml
# Generate report
yb report --from .youbencha-workspace/run-*/artifacts/results.json

After running, your workspace will contain:

.youbencha-workspace/
└── run-{timestamp}-{hash}/
├── src-modified/ # Code after agent execution
├── src-expected/ # Reference code (if configured)
├── artifacts/
│ ├── results.json # Machine-readable results
│ ├── report.md # Human-readable report
│ ├── youbencha.log.json # Agent execution log
│ ├── git-diff.patch # Git diff output
│ └── expected-diff.json # File-by-file similarity analysis
└── .youbencha.lock # Workspace metadata
FileFormatDescription
results.jsonJSONMachine-readable evaluation results
report.mdMarkdownHuman-readable summary report
youbencha.log.jsonJSONComplete agent execution log
git-diff.patchPatchGit-format diff of all changes
expected-diff.jsonJSONFile-by-file similarity analysis
CodeMeaningAction
0All evaluators passed✅ Success
1One or more evaluators failedReview results
2Configuration errorFix configuration
3Agent execution errorCheck agent setup
4Timeout exceededIncrease timeout or optimize
5Repository clone failedCheck URL and credentials
Terminal
$ yb run -c suite.yaml
🚀 Starting evaluation...
Configuration validated
📁 Created workspace: .youbencha-workspace/run-2024-11-15-143022-abc123/
📥 Cloning https://github.com/youbencha/hello-world.git...
🤖 Running agent (copilot-cli)...
📊 Running evaluators...
git-diff
agentic-judge
💾 Results saved to artifacts/results.json
Evaluation complete! All 2 evaluators passed.

If you see validation errors, run validation with verbose mode:

Terminal
yb validate -c suite.yaml -v

Common issues include invalid YAML syntax, missing required fields, or unsupported agent types.