Skip to content

yb report

Generate human-readable reports from evaluation results.

Terminal window
yb report --from <results-file> [options]

The report command transforms the machine-readable results.json into human-friendly formats. It provides comprehensive evaluation summaries including:

  • Overall pass/fail status
  • Per-evaluator results with metrics
  • File change summaries
  • Execution timing and details
  • Actionable insights
OptionShortTypeDefaultDescription
--from-fstringRequiredPath to results.json file
--formatstringmarkdownOutput format: json, markdown, html
--output-ostringstdoutCustom output file path
--summary-only-sflagfalseShow only pass/fail summary
--include-logsflagfalseInclude agent execution logs
--help-hflag-Show help message

Generate a Markdown report to stdout:

Terminal
yb report --from .youbencha-workspace/run-*/artifacts/results.json

Write report to a specific location:

Terminal
yb report --from results.json --output ./reports/evaluation-report.md

Output as JSON for programmatic processing:

Terminal
yb report --from results.json --format json

Generate an HTML report for viewing in a browser:

Terminal
yb report --from results.json --format html --output report.html

Get just the pass/fail status:

Terminal
yb report --from results.json --summary-only

Add agent logs to the report:

Terminal
yb report --from results.json --include-logs

Match the latest run automatically:

Terminal
yb report --from ".youbencha-workspace/run-*/artifacts/results.json"
report.md
📊 youBencha Evaluation Report
==============================
Suite: hello-world-evaluation
Status: ✅ PASSED
Duration: 45.2s
Evaluator Results:
------------------
✅ git-diff
Files changed: 2
Lines added: 15
Lines removed: 3
Total changes: 18
✅ agentic-judge
task_completed: 1.0 (PASS)
code_quality: 0.8 (PASS)
Files Changed:
--------------
- README.md (+10, -2)
- src/index.ts (+5, -1)
Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/

When using --format json, the output follows this schema:

FieldTypeDescription
summary.suite_namestringName of the evaluation suite
summary.overall_statusstring"passed" or "failed"
summary.passednumberCount of passed evaluators
summary.failednumberCount of failed evaluators
summary.duration_msnumberTotal execution time in milliseconds
evaluators[]arrayPer-evaluator results
evaluators[].namestringEvaluator name
evaluators[].statusstring"passed" or "failed"
evaluators[].metricsobjectEvaluator-specific metrics
files_changed[]arrayList of modified files

Parse JSON output for automated decisions:

ci-script.sh
STATUS=$(yb report --from results.json --format json | jq -r '.summary.overall_status')
if [ "$STATUS" != "passed" ]; then
echo "Evaluation failed!"
exit 1
fi

Append reports to a JSONL log file:

Terminal
yb report --from results.json --format json >> evaluation-history.jsonl

Extract summary for notifications:

slack-notify.sh
REPORT=$(yb report --from results.json --summary-only)
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\": \"$REPORT\"}" \
"$SLACK_WEBHOOK_URL"

Generate reports for all runs:

Terminal
for result in .youbencha-workspace/run-*/artifacts/results.json; do
run_id=$(dirname "$result" | xargs basename)
yb report --from "$result" --output "reports/${run_id}.md"
done
CodeMeaning
0Report generated successfully
1Results file not found
2Invalid results format