CI/CD Integration
Integrate youBencha into your CI/CD pipeline to automate AI agent evaluations on every pull request or scheduled basis.
GitHub Actions
Section titled “GitHub Actions”Basic Workflow
Section titled “Basic Workflow”name: youBencha Evaluation
on: pull_request: branches: [main] schedule: - cron: '0 6 * * *' # Daily at 6 AM UTC
jobs: evaluate: runs-on: ubuntu-latest
steps: - name: Checkout uses: actions/checkout@v4
- name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20'
- name: Install youBencha run: npm install -g youbencha
- name: Run Evaluation run: yb run -c suite.yaml env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Check Results run: | FAILED=$(jq '.summary.failed' .youbencha-workspace/run-*/artifacts/results.json) if [ "$FAILED" -gt 0 ]; then echo "❌ Evaluation failed: $FAILED evaluators did not pass" exit 1 fi echo "✅ All evaluators passed"
- name: Upload Artifacts uses: actions/upload-artifact@v4 if: always() with: name: evaluation-results path: .youbencha-workspace/run-*/artifacts/With Copilot CLI
Section titled “With Copilot CLI”For evaluations using GitHub Copilot CLI:
name: youBencha with Copilot
on: pull_request: branches: [main]
jobs: evaluate: runs-on: ubuntu-latest
steps: - uses: actions/checkout@v4
- uses: actions/setup-node@v4 with: node-version: '20'
- name: Install Dependencies run: | npm install -g youbencha npm install -g @githubnext/github-copilot-cli
- name: Authenticate GitHub CLI run: gh auth login --with-token <<< "${{ secrets.GITHUB_TOKEN }}"
- name: Run Evaluation run: yb run -c suite.yaml env: GITHUB_TOKEN: ${{ secrets.COPILOT_TOKEN }}
- name: Generate Report run: yb report --from .youbencha-workspace/run-*/artifacts/results.json
- name: Upload Results uses: actions/upload-artifact@v4 with: name: evaluation-report path: | .youbencha-workspace/run-*/artifacts/report.md .youbencha-workspace/run-*/artifacts/results.jsonPR Status Checks
Section titled “PR Status Checks”Fail PR on Evaluation Failure
Section titled “Fail PR on Evaluation Failure”- name: Evaluate and Gate run: | yb run -c suite.yaml
STATUS=$(jq -r '.summary.overall_status' .youbencha-workspace/run-*/artifacts/results.json)
if [ "$STATUS" != "passed" ]; then echo "::error::youBencha evaluation failed" exit 1 fiAdd PR Comment with Results
Section titled “Add PR Comment with Results”- name: Comment on PR uses: actions/github-script@v7 if: github.event_name == 'pull_request' with: script: | const fs = require('fs'); const report = fs.readFileSync('.youbencha-workspace/run-*/artifacts/report.md', 'utf8');
await github.rest.issues.createComment({ owner: context.repo.owner, repo: context.repo.repo, issue_number: context.issue.number, body: `## youBencha Evaluation Results\n\n${report}` });Scheduled Evaluations
Section titled “Scheduled Evaluations”Daily Regression Check
Section titled “Daily Regression Check”name: Daily Regression
on: schedule: - cron: '0 6 * * *' # 6 AM UTC daily
jobs: regression: runs-on: ubuntu-latest
steps: - uses: actions/checkout@v4
- name: Run Regression Suite run: | npm install -g youbencha yb run -c suites/regression.yaml
- name: Store Results run: | DATE=$(date +%Y-%m-%d) cp .youbencha-workspace/run-*/artifacts/results.json ./history/${DATE}.json
- name: Commit History run: | git config user.name "github-actions" git config user.email "[email protected]" git add history/ git commit -m "Add regression results for $(date +%Y-%m-%d)" || true git pushMatrix Evaluations
Section titled “Matrix Evaluations”Run multiple suites in parallel:
jobs: evaluate: runs-on: ubuntu-latest strategy: matrix: suite: - auth-feature - api-endpoints - database-migrations
steps: - uses: actions/checkout@v4
- name: Run Suite run: | npm install -g youbencha yb run -c suites/${{ matrix.suite }}.yaml
- name: Upload Results uses: actions/upload-artifact@v4 with: name: results-${{ matrix.suite }} path: .youbencha-workspace/run-*/artifacts/Environment Secrets
Section titled “Environment Secrets”Store sensitive values in GitHub Secrets:
- name: Run Evaluation run: yb run -c suite.yaml env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} COPILOT_TOKEN: ${{ secrets.COPILOT_TOKEN }} SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}Suite Configuration for CI
Section titled “Suite Configuration for CI”name: ci-evaluationrepo: https://github.com/${{ github.repository }}.gitbranch: ${{ github.head_ref }}
agent: type: copilot-cli config: prompt_file: ./prompts/ci-task.md
evaluators: - name: git-diff config: assertions: max_files_changed: 20
- name: agentic-judge config: type: copilot-cli assertions: ci_ready: "Changes are CI-ready. Score 0-1."
post_evaluation: - name: webhook config: url: ${SLACK_WEBHOOK_URL}Best Practices
Section titled “Best Practices”- Use secrets for tokens and webhooks
- Upload artifacts for debugging failures
- Set appropriate timeouts for agent execution
- Use matrix builds for multiple suites
- Store history for trend analysis