yb report

Generate human-readable reports from evaluation results.

Synopsis

yb report --from <results-file> [options]

Description

The report command transforms the machine-readable results.json into human-friendly formats. It provides comprehensive evaluation summaries including:

Overall pass/fail status
Per-evaluator results with metrics
File change summaries
Execution timing and details
Actionable insights

Options

Option	Short	Type	Default	Description
`--from`	`-f`	string	Required	Path to `results.json` file
`--format`		string	`markdown`	Output format: `json`, `markdown`, `html`
`--output`	`-o`	string	stdout	Custom output file path
`--summary-only`	`-s`	flag	`false`	Show only pass/fail summary
`--include-logs`		flag	`false`	Include agent execution logs
`--help`	`-h`	flag	-	Show help message

Examples

Basic Usage

Generate a Markdown report to stdout:

yb report --from .youbencha-workspace/run-*/artifacts/results.json

Save to File

Write report to a specific location:

yb report --from results.json --output ./reports/evaluation-report.md

JSON Format

Output as JSON for programmatic processing:

yb report --from results.json --format json

HTML Report

Generate an HTML report for viewing in a browser:

yb report --from results.json --format html --output report.html

Summary Only

Get just the pass/fail status:

yb report --from results.json --summary-only

Include Execution Logs

Add agent logs to the report:

yb report --from results.json --include-logs

Wildcard Path

Match the latest run automatically:

yb report --from ".youbencha-workspace/run-*/artifacts/results.json"

Output Formats

📊 youBencha Evaluation Report
==============================

Suite: hello-world-evaluation
Status: ✅ PASSED
Duration: 45.2s

Evaluator Results:
------------------

✅ git-diff
   Files changed: 2
   Lines added: 15
   Lines removed: 3
   Total changes: 18

✅ agentic-judge
   task_completed: 1.0 (PASS)
   code_quality: 0.8 (PASS)

Files Changed:
--------------
- README.md (+10, -2)
- src/index.ts (+5, -1)

Workspace: .youbencha-workspace/run-2024-11-15-123456-abc123/

{
  "summary": {
    "suite_name": "hello-world-evaluation",
    "overall_status": "passed",
    "passed": 2,
    "failed": 0,
    "duration_ms": 45200,
    "timestamp": "2024-11-15T12:34:56Z"
  },
  "evaluators": [
    {
      "name": "git-diff",
      "status": "passed",
      "metrics": {
        "files_changed": 2,
        "lines_added": 15,
        "lines_removed": 3
      }
    },
    {
      "name": "agentic-judge",
      "status": "passed",
      "assertions": {
        "task_completed": { "score": 1.0, "passed": true },
        "code_quality": { "score": 0.8, "passed": true }
      }
    }
  ],
  "files_changed": [
    { "path": "README.md", "added": 10, "removed": 2 },
    { "path": "src/index.ts", "added": 5, "removed": 1 }
  ],
  "workspace": ".youbencha-workspace/run-2024-11-15-123456-abc123/"
}

<!DOCTYPE html>
<html>
<head>
  <title>youBencha Report</title>
  <style>/* embedded styles */</style>
</head>
<body>
  <h1>📊 youBencha Evaluation Report</h1>
  <div class="status passed">✅ PASSED</div>
  <!-- Interactive report with collapsible sections -->
</body>
</html>

JSON Schema Reference

When using --format json, the output follows this schema:

Field	Type	Description
`summary.suite_name`	string	Name of the evaluation suite
`summary.overall_status`	string	`"passed"` or `"failed"`
`summary.passed`	number	Count of passed evaluators
`summary.failed`	number	Count of failed evaluators
`summary.duration_ms`	number	Total execution time in milliseconds
`evaluators[]`	array	Per-evaluator results
`evaluators[].name`	string	Evaluator name
`evaluators[].status`	string	`"passed"` or `"failed"`
`evaluators[].metrics`	object	Evaluator-specific metrics
`files_changed[]`	array	List of modified files

Use Cases

CI/CD Integration

Parse JSON output for automated decisions:

STATUS=$(yb report --from results.json --format json | jq -r '.summary.overall_status')
if [ "$STATUS" != "passed" ]; then
  echo "Evaluation failed!"
  exit 1
fi

Historical Tracking

Append reports to a JSONL log file:

yb report --from results.json --format json >> evaluation-history.jsonl

Slack Notifications

Extract summary for notifications:

REPORT=$(yb report --from results.json --summary-only)
curl -X POST -H 'Content-type: application/json' \
  --data "{\"text\": \"$REPORT\"}" \
  "$SLACK_WEBHOOK_URL"

Multiple Run Comparison

Generate reports for all runs:

for result in .youbencha-workspace/run-*/artifacts/results.json; do
  run_id=$(dirname "$result" | xargs basename)
  yb report --from "$result" --output "reports/${run_id}.md"
done

Exit Codes

Code	Meaning
`0`	Report generated successfully
`1`	Results file not found
`2`	Invalid results format

yb run - Execute evaluations
Results & Reporting - Advanced reporting options
CI/CD Integration - Automated reporting