Security
Security is a critical consideration when running AI agent evaluations. This guide covers youBencha’s security model and best practices.
Security Model
Section titled “Security Model”youBencha implements multiple security layers:
- Repository validation - Blocks dangerous URLs
- Workspace isolation - Isolated execution environments
- Controlled execution - Managed command execution
- Credentials protection - Never hardcode secrets
Repository Validation
Section titled “Repository Validation”Blocked URLs
Section titled “Blocked URLs”youBencha blocks URLs that could enable SSRF attacks:
# ❌ Blockedrepo: http://localhost/repo.gitrepo: http://127.0.0.1/repo.gitrepo: http://192.168.1.100/repo.gitrepo: http://10.0.0.1/repo.git
# ✅ Allowedrepo: https://github.com/org/repo.gitrepo: https://gitlab.com/org/repo.gitURL Scheme Requirements
Section titled “URL Scheme Requirements”Only secure protocols are allowed:
| Scheme | Allowed |
|---|---|
https:// | ✅ Yes |
http:// | ⚠️ Public only |
file:// | ❌ No |
ssh:// | ❌ No (for now) |
Workspace Isolation
Section titled “Workspace Isolation”Isolation Guarantees
Section titled “Isolation Guarantees”- Fresh clone for each evaluation
- Separate directory per run
- No cross-contamination between runs
- Original repo unchanged
Workspace Location
Section titled “Workspace Location”Default: .youbencha-workspace/run-{timestamp}-{hash}/
All operations are scoped to this directory:
.youbencha-workspace/└── run-20241115-103000-abc123/ ├── src-modified/ # Agent works here ├── src-expected/ # Reference only └── artifacts/ # Output onlyCommand Execution Security
Section titled “Command Execution Security”Pre-Execution Hooks
Section titled “Pre-Execution Hooks”Scripts run with controlled environment:
pre_execution: - name: script config: command: bash args: ["-c", "echo $WORKSPACE_DIR"] env: CUSTOM_VAR: "value" # Explicit env vars onlySecurity measures:
shell: truefor flexibility (controlled)- Only specified environment variables passed
- Never inherits full
process.env - Timeout enforcement
Post-Evaluation Hooks
Section titled “Post-Evaluation Hooks”Post-evaluation hooks:
- Run in parallel
- Have read-only access to results
- Cannot modify workspace
- Never fail main evaluation
Credentials Management
Section titled “Credentials Management”Best Practices
Section titled “Best Practices”- Never hardcode secrets:
# ❌ Badpost_evaluation: - name: webhook config: url: https://hooks.slack.com/services/T00/B00/XXXX
# ✅ Goodpost_evaluation: - name: webhook config: url: ${SLACK_WEBHOOK_URL}- Use environment variables:
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."export API_TOKEN="secret-token"yb run -c suite.yaml- CI/CD secrets:
- name: Run Evaluation run: yb run -c suite.yaml env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}Variable Expansion
Section titled “Variable Expansion”Use ${VAR} syntax throughout configuration:
repo: ${REPO_URL}agent: config: prompt: "Token: ${API_TOKEN}"post_evaluation: - name: webhook config: url: ${WEBHOOK_URL} headers: Authorization: "Bearer ${AUTH_TOKEN}"Token Scopes
Section titled “Token Scopes”GitHub Token
Section titled “GitHub Token”Minimum required scopes:
repo- Access repositoriesread:user- Read user info (for Copilot)
For Copilot CLI:
- Active Copilot subscription required
- Token with
copilotscope
Network Security
Section titled “Network Security”Outbound Connections
Section titled “Outbound Connections”youBencha makes these external connections:
| Purpose | Destination |
|---|---|
| Clone repo | GitHub/GitLab |
| AI agent | Copilot API |
| Webhooks | Your endpoints |
Firewall Considerations
Section titled “Firewall Considerations”Ensure access to:
github.com(repos, API)api.github.comcopilot-proxy.githubusercontent.com- Your webhook endpoints
Audit Trail
Section titled “Audit Trail”Logging
Section titled “Logging”Every evaluation logs:
- Repository URL and branch
- Agent configuration (prompts sanitized)
- Evaluator results
- Execution timing
- Hook execution status
Results Retention
Section titled “Results Retention”Configure result export for audit:
post_evaluation: - name: database config: type: json-file output_path: ./audit/evaluations.jsonl include_full_bundle: true append: trueSecurity Checklist
Section titled “Security Checklist”- Use HTTPS for all repositories
- Store secrets in environment variables
- Review pre-execution scripts for safety
- Limit webhook URLs to trusted endpoints
- Set appropriate timeouts
- Clean up workspaces regularly
- Audit evaluation logs periodically
Reporting Security Issues
Section titled “Reporting Security Issues”Found a security vulnerability?
Email: [email protected]
Please include:
- Description of the issue
- Steps to reproduce
- Potential impact