Skip to content

Examples

Learn from practical examples of youBencha in production environments.

CI/CD Integration

Run youBencha evaluations in GitHub Actions with automated pass/fail gates. View Example →

Slack Notifications

Send evaluation results to Slack channels for team visibility. View Example →

The simplest possible evaluation:

suite.yaml
repo: https://github.com/example/repo.git
agent:
type: copilot-cli
config:
prompt: "Add a helpful comment to the README"
evaluators:
- name: git-diff

Add AI-powered quality assessment:

suite.yaml
repo: https://github.com/example/repo.git
branch: main
agent:
type: copilot-cli
config:
prompt: "Add error handling to all API endpoints"
evaluators:
- name: git-diff
config:
assertions:
max_files_changed: 10
max_lines_added: 200
- name: agentic-judge
config:
type: copilot-cli
assertions:
error_handling: "Proper try-catch blocks added. Score 0-1."
user_feedback: "Error messages are user-friendly. Score 0-1."

Compare against a known-good implementation:

suite.yaml
repo: https://github.com/example/repo.git
branch: main
expected_source: branch
expected: feature/completed
agent:
type: copilot-cli
config:
prompt_file: ./prompts/add-feature.md
evaluators:
- name: expected-diff
config:
threshold: 0.85
- name: git-diff
- name: agentic-judge
config:
type: copilot-cli
assertions:
matches_spec: "Implementation matches requirements. Score 0-1."

Run the same evaluation regularly to detect regressions:

suite.yaml
name: daily-regression
repo: https://github.com/example/repo.git
branch: main
agent:
type: copilot-cli
model: claude-sonnet-4.5
config:
prompt_file: ./prompts/standard-task.md
evaluators:
- name: git-diff
- name: agentic-judge
config:
type: copilot-cli
assertions:
quality: "Code quality meets standards. Score 0-1."
post_evaluation:
- name: database
config:
type: json-file
output_path: ./history/regression-results.jsonl
append: true

Comprehensive evaluation with multiple focused judges:

suite.yaml
evaluators:
- name: git-diff
config:
assertions:
max_files_changed: 15
- name: agentic-judge-security
config:
type: copilot-cli
assertions:
no_vulnerabilities: "No security vulnerabilities introduced. Score 0-1."
- name: agentic-judge-testing
config:
type: copilot-cli
assertions:
tests_added: "Appropriate tests added. Score 0-1."
- name: agentic-judge-docs
config:
type: copilot-cli
assertions:
documented: "Changes are documented. Score 0-1."