Reusable Evaluator Definitions

Define evaluators in separate files to reuse them across multiple suites and keep configurations DRY.

Overview

Instead of defining evaluators inline in each suite, you can:

Create evaluator definition files
Reference them in your suites
Share across multiple test cases

Creating Evaluator Files

Evaluator Definition Structure

name: agentic-judge:test-coverage
description: "Ensures code changes include appropriate test coverage"

config:
  type: copilot-cli
  agent_name: agentic-judge
  assertions:
    unit_tests: "New code has unit tests. Score 1 if comprehensive, 0.5 if basic, 0 if none."
    edge_cases: "Edge cases are tested. Score 1 if covered, 0 if missing."
    test_quality: "Tests are well-structured and maintainable. Score 0-1."

Referencing in Suite

evaluators:
  # External evaluator definition
  - file: ./evaluators/test-coverage.yaml

  # Can mix with inline definitions
  - name: git-diff
    config:
      assertions:
        max_files_changed: 10

Organization Patterns

By Focus Area

evaluators/
├── security.yaml
├── testing.yaml
├── documentation.yaml
├── code-quality.yaml
└── performance.yaml

By Project Type

evaluators/
├── frontend/
│   ├── react-best-practices.yaml
│   └── accessibility.yaml
├── backend/
│   ├── api-security.yaml
│   └── database-safety.yaml
└── shared/
    ├── error-handling.yaml
    └── logging.yaml

Example: Security Evaluator

name: agentic-judge:security
description: "Evaluates security best practices in code changes"

config:
  type: copilot-cli
  model: claude-sonnet-4.5
  prompt_file: ./prompts/security-expert.txt
  assertions:
    input_validation: |
      All user inputs are validated before use.
      Score 1 if properly validated with sanitization.
      Score 0.5 if validated but not sanitized.
      Score 0 if no validation.
    no_injection: |
      Code is protected against injection attacks (SQL, XSS, command).
      Score 1 if parameterized queries/escaped output used.
      Score 0 if raw input used in queries or output.
    auth_check: |
      Protected routes verify authentication.
      Score 1 if all routes check auth.
      Score 0 if any route is unprotected.
    secrets_safe: |
      No hardcoded secrets or credentials.
      Score 1 if secrets from env vars.
      Score 0 if hardcoded values found.

Example: Documentation Evaluator

name: agentic-judge:documentation
description: "Ensures code is properly documented"

config:
  type: copilot-cli
  assertions:
    jsdoc_comments: |
      All public functions have JSDoc comments.
      Score 1 if all documented with @param and @returns.
      Score 0.5 if documented but incomplete.
      Score 0 if missing documentation.
    readme_updated: |
      README is updated for new features.
      Score 1 if README documents new functionality.
      Score 0.5 if README exists but not updated.
      Score 0 if no README or severely outdated.
    inline_comments: |
      Complex logic has explanatory comments.
      Score 1 if complex sections are explained.
      Score 0 if confusing code lacks comments.

Using Multiple Evaluator Files

repo: https://github.com/example/app.git
branch: main

agent:
  type: copilot-cli
  config:
    prompt: "Add user authentication feature"

evaluators:
  # Reusable evaluators
  - file: ./evaluators/security.yaml
  - file: ./evaluators/testing.yaml
  - file: ./evaluators/documentation.yaml

  # Task-specific inline evaluator
  - name: git-diff
    config:
      assertions:
        max_files_changed: 15

Overriding Evaluator Config

You can override settings from the file:

evaluators:
  - file: ./evaluators/security.yaml
    config:
      model: gpt-5  # Override model from file

Git Submodule

git submodule add https://github.com/org/shared-evaluators.git evaluators/shared

evaluators:
  - file: ./evaluators/shared/security.yaml

npm Package

Create a package with evaluator definitions:

{
  "name": "@org/evaluators",
  "files": ["evaluators/"]
}

evaluators:
  - file: ./node_modules/@org/evaluators/security.yaml

Best Practices

Descriptive names - Use agentic-judge:focus-area naming
Clear descriptions - Document what the evaluator checks
Multi-line assertions - Use YAML multi-line for readable assertions
Version control - Track evaluator changes in git
Centralize shared evaluators - Keep team standards in one place