CI/CD Integration
Run youBencha evaluations in GitHub Actions with automated pass/fail gates. View Example →
Learn from practical examples of youBencha in production environments.
CI/CD Integration
Run youBencha evaluations in GitHub Actions with automated pass/fail gates. View Example →
Slack Notifications
Send evaluation results to Slack channels for team visibility. View Example →
The simplest possible evaluation:
repo: https://github.com/example/repo.gitagent: type: copilot-cli config: prompt: "Add a helpful comment to the README"evaluators: - name: git-diffAdd AI-powered quality assessment:
repo: https://github.com/example/repo.gitbranch: main
agent: type: copilot-cli config: prompt: "Add error handling to all API endpoints"
evaluators: - name: git-diff config: assertions: max_files_changed: 10 max_lines_added: 200
- name: agentic-judge config: type: copilot-cli assertions: error_handling: "Proper try-catch blocks added. Score 0-1." user_feedback: "Error messages are user-friendly. Score 0-1."Compare against a known-good implementation:
repo: https://github.com/example/repo.gitbranch: mainexpected_source: branchexpected: feature/completed
agent: type: copilot-cli config: prompt_file: ./prompts/add-feature.md
evaluators: - name: expected-diff config: threshold: 0.85
- name: git-diff
- name: agentic-judge config: type: copilot-cli assertions: matches_spec: "Implementation matches requirements. Score 0-1."Run the same evaluation regularly to detect regressions:
name: daily-regressionrepo: https://github.com/example/repo.gitbranch: main
agent: type: copilot-cli model: claude-sonnet-4.5 config: prompt_file: ./prompts/standard-task.md
evaluators: - name: git-diff - name: agentic-judge config: type: copilot-cli assertions: quality: "Code quality meets standards. Score 0-1."
post_evaluation: - name: database config: type: json-file output_path: ./history/regression-results.jsonl append: trueComprehensive evaluation with multiple focused judges:
evaluators: - name: git-diff config: assertions: max_files_changed: 15
- name: agentic-judge-security config: type: copilot-cli assertions: no_vulnerabilities: "No security vulnerabilities introduced. Score 0-1."
- name: agentic-judge-testing config: type: copilot-cli assertions: tests_added: "Appropriate tests added. Score 0-1."
- name: agentic-judge-docs config: type: copilot-cli assertions: documented: "Changes are documented. Score 0-1."