Examples

Learn from practical examples of youBencha in production environments.

Featured Examples

CI/CD Integration

Run youBencha evaluations in GitHub Actions with automated pass/fail gates. View Example →

Slack Notifications

Send evaluation results to Slack channels for team visibility. View Example →

Quick Start Examples

Minimal Configuration

The simplest possible evaluation:

repo: https://github.com/example/repo.git
agent:
  type: copilot-cli
  config:
    prompt: "Add a helpful comment to the README"
evaluators:
  - name: git-diff

With Quality Checks

Add AI-powered quality assessment:

repo: https://github.com/example/repo.git
branch: main

agent:
  type: copilot-cli
  config:
    prompt: "Add error handling to all API endpoints"

evaluators:
  - name: git-diff
    config:
      assertions:
        max_files_changed: 10
        max_lines_added: 200

  - name: agentic-judge
    config:
      type: copilot-cli
      assertions:
        error_handling: "Proper try-catch blocks added. Score 0-1."
        user_feedback: "Error messages are user-friendly. Score 0-1."

With Reference Comparison

Compare against a known-good implementation:

repo: https://github.com/example/repo.git
branch: main
expected_source: branch
expected: feature/completed

agent:
  type: copilot-cli
  config:
    prompt_file: ./prompts/add-feature.md

evaluators:
  - name: expected-diff
    config:
      threshold: 0.85

  - name: git-diff

  - name: agentic-judge
    config:
      type: copilot-cli
      assertions:
        matches_spec: "Implementation matches requirements. Score 0-1."

Common Patterns

Regression Testing

Run the same evaluation regularly to detect regressions:

name: daily-regression
repo: https://github.com/example/repo.git
branch: main

agent:
  type: copilot-cli
  model: claude-sonnet-4.5
  config:
    prompt_file: ./prompts/standard-task.md

evaluators:
  - name: git-diff
  - name: agentic-judge
    config:
      type: copilot-cli
      assertions:
        quality: "Code quality meets standards. Score 0-1."

post_evaluation:
  - name: database
    config:
      type: json-file
      output_path: ./history/regression-results.jsonl
      append: true

Multi-Evaluator Setup

Comprehensive evaluation with multiple focused judges:

evaluators:
  - name: git-diff
    config:
      assertions:
        max_files_changed: 15

  - name: agentic-judge-security
    config:
      type: copilot-cli
      assertions:
        no_vulnerabilities: "No security vulnerabilities introduced. Score 0-1."

  - name: agentic-judge-testing
    config:
      type: copilot-cli
      assertions:
        tests_added: "Appropriate tests added. Score 0-1."

  - name: agentic-judge-docs
    config:
      type: copilot-cli
      assertions:
        documented: "Changes are documented. Score 0-1."

Next Steps

CI/CD Integration - Automate evaluations
Slack Notifications - Team notifications
Best Practices - Tips for effective evaluations