yb report
Generate human-readable reports from evaluation results.
Synopsis
Section titled “Synopsis”yb report --from <results-file> [options]Description
Section titled “Description”The report command transforms the machine-readable results.json into human-friendly formats. It provides comprehensive evaluation summaries including:
- Overall pass/fail status
- Per-evaluator results with metrics
- File change summaries
- Execution timing and details
- Actionable insights
Options
Section titled “Options”| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--from | -f | string | Required | Path to results.json file |
--format | string | markdown | Output format: json, markdown, html | |
--output | -o | string | stdout | Custom output file path |
--summary-only | -s | flag | false | Show only pass/fail summary |
--include-logs | flag | false | Include agent execution logs | |
--help | -h | flag | - | Show help message |
Examples
Section titled “Examples”Basic Usage
Section titled “Basic Usage”Generate a Markdown report to stdout:
yb report --from .youbencha-workspace/run-*/artifacts/results.jsonSave to File
Section titled “Save to File”Write report to a specific location:
yb report --from results.json --output ./reports/evaluation-report.mdJSON Format
Section titled “JSON Format”Output as JSON for programmatic processing:
yb report --from results.json --format jsonHTML Report
Section titled “HTML Report”Generate an HTML report for viewing in a browser:
yb report --from results.json --format html --output report.htmlSummary Only
Section titled “Summary Only”Get just the pass/fail status:
yb report --from results.json --summary-onlyInclude Execution Logs
Section titled “Include Execution Logs”Add agent logs to the report:
yb report --from results.json --include-logsWildcard Path
Section titled “Wildcard Path”Match the latest run automatically:
yb report --from ".youbencha-workspace/run-*/artifacts/results.json"Output Formats
Section titled “Output Formats”📊 youBencha Evaluation Report==============================
Suite: hello-world-evaluationStatus: ✅ PASSEDDuration: 45.2s
Evaluator Results:------------------
✅ git-diff Files changed: 2 Lines added: 15 Lines removed: 3 Total changes: 18
✅ agentic-judge task_completed: 1.0 (PASS) code_quality: 0.8 (PASS)
Files Changed:--------------- README.md (+10, -2)- src/index.ts (+5, -1)
Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/{ "summary": { "suite_name": "hello-world-evaluation", "overall_status": "passed", "passed": 2, "failed": 0, "duration_ms": 45200, "timestamp": "2024-11-15T12:34:56Z" }, "evaluators": [ { "name": "git-diff", "status": "passed", "metrics": { "files_changed": 2, "lines_added": 15, "lines_removed": 3 } }, { "name": "agentic-judge", "status": "passed", "assertions": { "task_completed": { "score": 1.0, "passed": true }, "code_quality": { "score": 0.8, "passed": true } } } ], "files_changed": [ { "path": "README.md", "added": 10, "removed": 2 }, { "path": "src/index.ts", "added": 5, "removed": 1 } ], "workspace": ".youbencha-workspace/run-2024-11-15-123456-abc123/"}<!DOCTYPE html><html><head> <title>youBencha Report</title> <style>/* embedded styles */</style></head><body> <h1>📊 youBencha Evaluation Report</h1> <div class="status passed">✅ PASSED</div> <!-- Interactive report with collapsible sections --></body></html>JSON Schema Reference
Section titled “JSON Schema Reference”When using --format json, the output follows this schema:
| Field | Type | Description |
|---|---|---|
summary.suite_name | string | Name of the evaluation suite |
summary.overall_status | string | "passed" or "failed" |
summary.passed | number | Count of passed evaluators |
summary.failed | number | Count of failed evaluators |
summary.duration_ms | number | Total execution time in milliseconds |
evaluators[] | array | Per-evaluator results |
evaluators[].name | string | Evaluator name |
evaluators[].status | string | "passed" or "failed" |
evaluators[].metrics | object | Evaluator-specific metrics |
files_changed[] | array | List of modified files |
Use Cases
Section titled “Use Cases”CI/CD Integration
Section titled “CI/CD Integration”Parse JSON output for automated decisions:
STATUS=$(yb report --from results.json --format json | jq -r '.summary.overall_status')if [ "$STATUS" != "passed" ]; then echo "Evaluation failed!" exit 1fiHistorical Tracking
Section titled “Historical Tracking”Append reports to a JSONL log file:
yb report --from results.json --format json >> evaluation-history.jsonlSlack Notifications
Section titled “Slack Notifications”Extract summary for notifications:
REPORT=$(yb report --from results.json --summary-only)curl -X POST -H 'Content-type: application/json' \ --data "{\"text\": \"$REPORT\"}" \ "$SLACK_WEBHOOK_URL"Multiple Run Comparison
Section titled “Multiple Run Comparison”Generate reports for all runs:
for result in .youbencha-workspace/run-*/artifacts/results.json; do run_id=$(dirname "$result" | xargs basename) yb report --from "$result" --output "reports/${run_id}.md"doneExit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
0 | Report generated successfully |
1 | Results file not found |
2 | Invalid results format |
Related Commands
Section titled “Related Commands”- yb run - Execute evaluations
- Results & Reporting - Advanced reporting options
- CI/CD Integration - Automated reporting