yb list
List available built-in evaluators and their descriptions.
Synopsis
Section titled “Synopsis”yb list [options]Description
Section titled “Description”The list command displays all built-in evaluators available in youBencha. It shows evaluator names, descriptions, and key capabilities to help you choose the right evaluators for your use case.
Options
Section titled “Options”| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--format | -f | string | table | Output format: table, json, yaml |
--verbose | -v | flag | false | Show detailed evaluator information |
--help | -h | flag | - | Show help message |
Examples
Section titled “Examples”Basic Usage
Section titled “Basic Usage”List all available evaluators:
yb listOutput:
Available Evaluators:
┌───────────────┬────────────────────────────────────────────┐│ Evaluator │ Description │├───────────────┼────────────────────────────────────────────┤│ git-diff │ Analyzes Git changes made by the agent ││ expected-diff │ Compares output against expected reference ││ agentic-judge │ AI-powered code quality assessment │└───────────────┴────────────────────────────────────────────┘
Use 'yb list -v' for detailed information.Verbose Output
Section titled “Verbose Output”Show detailed information about each evaluator:
yb list -vgit-diff────────────────────────────────────────────Analyzes Git changes made by the AI agent.
Metrics: • files_changed Number of files modified • lines_added Total lines added • lines_removed Total lines removed • change_entropy Distribution of changes
Assertions (optional): • max_files_changed Maximum allowed files • max_lines_added Maximum lines added • max_lines_removed Maximum lines removed
Use when: You want to track scope of changes or enforce limits.expected-diff────────────────────────────────────────────Compares agent output against an expected reference.
Requires: • expected_source: 'branch' or 'directory' • expected: branch name or path
Metrics: • similarity_score Overall similarity (0-1) • matching_files Files that match exactly • differing_files Files with differences
Configuration: • threshold Minimum similarity to pass (default: 0.8)
Use when: You have a "golden" solution to compare against.agentic-judge────────────────────────────────────────────Uses AI agent to evaluate code quality.
Requires: • config.type: Agent type (e.g., 'copilot-cli') • config.assertions: Key-value criteria
Metrics: • Custom assertion scores (0-1 each)
Configuration: • evaluator_file Path to evaluator prompt file • threshold Default pass threshold (0.5)
Use when: You need subjective quality assessment.JSON Output
Section titled “JSON Output”Get evaluator list in JSON format:
yb list --format json{ "evaluators": [ { "name": "git-diff", "description": "Analyzes Git changes made by the agent", "metrics": ["files_changed", "lines_added", "lines_removed", "change_entropy"], "requires_config": false }, { "name": "expected-diff", "description": "Compares output against expected reference", "metrics": ["similarity_score", "matching_files", "differing_files"], "requires_config": true }, { "name": "agentic-judge", "description": "AI-powered code quality assessment", "metrics": ["custom assertions"], "requires_config": true } ]}YAML Output
Section titled “YAML Output”Get evaluator list in YAML format:
yb list --format yamlevaluators: - name: git-diff description: Analyzes Git changes made by the agent requires_config: false - name: expected-diff description: Compares output against expected reference requires_config: true - name: agentic-judge description: AI-powered code quality assessment requires_config: trueEvaluator Reference
Section titled “Evaluator Reference”git-diff
Section titled “git-diff”Measures the scope and distribution of changes made by the AI agent.
| Metric | Type | Description |
|---|---|---|
files_changed | number | Count of modified files |
lines_added | number | Total lines added |
lines_removed | number | Total lines removed |
change_entropy | number | Distribution score (0-1) |
Use when: Track change scope, enforce limits, analyze patterns.
evaluators: - name: git-diff config: assertions: max_files_changed: 5 max_lines_added: 100expected-diff
Section titled “expected-diff”Compares the agent’s output against a known-correct reference implementation.
| Metric | Type | Description |
|---|---|---|
similarity_score | number | Overall similarity (0-1) |
matching_files | number | Files that match exactly |
differing_files | number | Files with differences |
Requires: expected_source and expected in suite config.
expected_source: branchexpected: feature/completed
evaluators: - name: expected-diff config: threshold: 0.85agentic-judge
Section titled “agentic-judge”Uses an AI agent to evaluate code quality based on custom assertions.
| Metric | Type | Description |
|---|---|---|
| Custom assertions | number | Each returns 0-1 score |
Use when: Subjective quality assessment or complex criteria.
evaluators: - name: agentic-judge config: type: copilot-cli assertions: has_tests: "Unit tests were added. Score 1 if yes, 0 if no." clean_code: "Code follows best practices. Score 0-1."Exit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
0 | List displayed successfully |
Related
Section titled “Related”- Evaluators Overview - Detailed evaluator documentation
- git-diff - git-diff reference
- expected-diff - expected-diff reference
- agentic-judge - agentic-judge reference