// guide

Automated Accessibility Testing: Tools, CI/CD, and What to Test

Automated accessibility testing catches common WCAG violations early, before they reach production. This guide covers the major tools, how to integrate them into your development workflow and CI/CD pipeline, and what you still need to test manually.

beginner reference

// 01 · what automated testing can (and can't) catch

What Automated Testing Can (and Can't) Catch

Automated accessibility tools can detect roughly 30-40% of WCAG issues. That sounds low, but it includes many of the most common and most damaging violations. The key is understanding what falls inside and outside the detection boundary so you know where to invest manual effort.

What automated tools reliably catch

  • Missing alt text on images. Tools detect when <img> elements lack an alt attribute entirely.
  • Color contrast failures. Tools compute the contrast ratio between text and background colors and flag violations against WCAG AA or AAA thresholds.
  • Missing form labels. Tools detect <input>, <select>, and <textarea> elements that lack an associated <label> or accessible name.
  • Duplicate id attributes. Tools flag multiple elements sharing the same id, which breaks label associations, ARIA references, and fragment navigation.
  • Missing lang attribute. Tools detect when the <html> element lacks a lang attribute, which screen readers need to select the correct speech synthesis voice.
  • ARIA attribute validity. Tools check that ARIA roles, states, and properties are valid, that required attributes are present, and that values are within the allowed set.

What automated tools cannot catch

  • Meaningful alt text quality. A tool can tell you an image has alt="image", but it cannot judge whether the alt text is accurate, useful, or equivalent to the visual content.
  • Keyboard operability. Tools cannot tab through your interface, operate controls, and verify that every interaction works without a mouse.
  • Logical focus order. Whether focus moves through the page in a sequence that makes sense requires human judgment about the visual layout and reading flow.
  • Screen reader experience. How content is announced, whether the reading order is logical, and whether dynamic updates are communicated cannot be evaluated by static analysis.
  • Cognitive load. Whether instructions are clear, whether error messages are helpful, and whether the interface is understandable to users with cognitive disabilities requires human evaluation.
Do not over-rely on automated testing A clean automated scan does not mean your site is accessible. It means the 30-40% of issues that tools can detect are not present. The remaining 60-70% — keyboard operability, screen reader experience, meaningful content, and cognitive usability — require manual testing. Treat automated testing as a safety net, not a certification.

// 02 · the testing pyramid

The Testing Pyramid

Accessibility testing works best as a layered strategy. Each layer catches different types of issues, and skipping a layer leaves specific categories of bugs undetected.

  1. Automated scanning (base layer) — Run on every commit or pull request. Catches syntax-level violations: missing attributes, invalid ARIA, contrast failures, structural issues. This is the widest layer because it runs constantly and catches the most frequent mistakes.
  2. Semi-automated / component testing — Unit and integration tests that render components and run axe-core against the output. Catches issues in component markup that may not appear in a full-page scan, such as a dynamically rendered form field missing its label.
  3. Manual testing (keyboard + screen reader) — A human tester tabs through the interface, operates every control with keyboard alone, and verifies the experience in a screen reader. Catches interaction failures, focus management bugs, announcement issues, and reading order problems.
  4. User testing (top layer) — Real users with disabilities use your product and provide feedback. Catches usability issues that no amount of standards-based testing reveals: confusing workflows, unclear error recovery, unhelpful labeling, and unexpected behavior.
Each layer catches what the others miss Automated scanning catches the most issues by volume. Manual testing catches the most impactful issues by severity. User testing catches the issues that matter most to real people. A complete strategy includes all four layers.

// 03 · tool comparison

Tool Comparison

The accessibility testing ecosystem includes engines, CLI runners, browser extensions, and test framework integrations. The table below compares the most widely used tools.

Tool Type Rules Best For Integration
axe-core Engine / library 90+ rules, WCAG 2.1 AA Programmatic testing, CI/CD, embedding in other tools npm, browser extensions, test frameworks
Pa11y CLI runner HTML CodeSniffer or axe-core Command-line scanning, CI pipelines, batch URL testing Node.js CLI, CI/CD
Lighthouse Audit tool (Chrome built-in) axe-core subset + custom checks Quick audits, performance + a11y combined scoring Chrome DevTools, CLI, CI/CD
WAVE Browser extension Custom rule set Visual in-page reporting, quick manual reviews Browser extension, API
jest-axe / cypress-axe Test framework integration axe-core rules Unit and E2E test suites, developer workflow Jest, Cypress, CI/CD

// 04 · axe-core deep dive

axe-core Deep Dive

axe-core is the most widely adopted accessibility testing engine. It powers the axe DevTools browser extension, Lighthouse accessibility audits, jest-axe, cypress-axe, and dozens of other tools. Understanding axe-core means understanding the foundation of most automated accessibility testing.

axe DevTools Browser Extension

The fastest way to start is the axe DevTools browser extension for Chrome or Firefox. Install it, open DevTools, navigate to the "axe DevTools" tab, and click "Scan ALL of my page." Results are grouped by severity (critical, serious, moderate, minor) with direct links to the offending elements.

@axe-core/cli

Run axe-core from the command line against any URL:

Terminal
# Install globally
npm install -g @axe-core/cli

# Scan a URL
axe https://example.com

# Scan with specific rules
axe https://example.com --rules color-contrast,image-alt

# Output as JSON for CI parsing
axe https://example.com --save results.json

jest-axe for Unit Tests

jest-axe integrates axe-core into Jest, letting you test rendered component markup for accessibility violations as part of your unit test suite:

JavaScript (Jest)
const { axe, toHaveNoViolations } = require('jest-axe');

expect.extend(toHaveNoViolations);

test('navigation has no accessibility violations', async () => {
  const { container } = render(<Navigation />);
  const results = await axe(container);
  expect(results).toHaveNoViolations();
});

cypress-axe for E2E Tests

cypress-axe runs axe-core inside Cypress end-to-end tests, scanning real pages in a real browser:

JavaScript (Cypress)
// cypress/e2e/accessibility.cy.js
describe('Accessibility', () => {
  it('home page has no a11y violations', () => {
    cy.visit('/');
    cy.injectAxe();
    cy.checkA11y();
  });

  it('form page has no a11y violations after interaction', () => {
    cy.visit('/contact');
    cy.injectAxe();
    cy.get('#name').type('Test User');
    cy.get('#email').type('test@example.com');
    cy.checkA11y();
  });
});

@axe-core/playwright

For Playwright-based test suites, the @axe-core/playwright package provides direct integration:

JavaScript (Playwright)
const { test, expect } = require('@playwright/test');
const AxeBuilder = require('@axe-core/playwright').default;

test('page has no accessibility violations', async ({ page }) => {
  await page.goto('https://example.com');

  const results = await new AxeBuilder({ page }).analyze();
  expect(results.violations).toEqual([]);
});

test('page has no critical violations', async ({ page }) => {
  await page.goto('https://example.com');

  const results = await new AxeBuilder({ page })
    .withTags(['wcag2a', 'wcag2aa'])
    .analyze();
  expect(results.violations).toEqual([]);
});

// 05 · ci/cd integration

CI/CD Integration

The highest-value placement for automated accessibility testing is in your CI/CD pipeline. When accessibility checks run on every pull request, violations are caught before they merge into your main branch.

GitHub Actions Workflow

The following workflow runs Pa11y against your deployed or locally served pages on every push and pull request:

.github/workflows/accessibility.yml
name: Accessibility Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  a11y:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install dependencies
        run: npm ci

      - name: Build site
        run: npm run build

      - name: Start server
        run: npm run serve &
        env:
          PORT: 3000

      - name: Wait for server
        run: npx wait-on http://localhost:3000

      - name: Run Pa11y
        run: |
          npx pa11y-ci --config .pa11yci.json

      - name: Run Lighthouse accessibility audit
        run: |
          npx lighthouse http://localhost:3000 \
            --only-categories=accessibility \
            --output=json \
            --output-path=./lighthouse-results.json \
            --chrome-flags="--headless --no-sandbox"

      - name: Check Lighthouse score
        run: |
          SCORE=$(node -e "const r=require('./lighthouse-results.json');console.log(r.categories.accessibility.score * 100)")
          echo "Accessibility score: $SCORE"
          if [ "$SCORE" -lt 90 ]; then
            echo "Accessibility score $SCORE is below threshold of 90"
            exit 1
          fi

Failing Builds on Violations

Pa11y and axe-core both exit with a non-zero exit code when violations are found, which causes CI builds to fail automatically. You can configure severity thresholds to control which violations block the build:

.pa11yci.json
{
  "defaults": {
    "standard": "WCAG2AA",
    "runners": ["axe"],
    "threshold": 0
  },
  "urls": [
    "http://localhost:3000/",
    "http://localhost:3000/contact",
    "http://localhost:3000/about"
  ]
}
Start with warnings, graduate to errors If you are adding automated accessibility testing to an existing project, start with a high threshold or treat violations as warnings rather than build failures. Fix existing issues incrementally, then lower the threshold to zero. This prevents the CI gate from blocking all development while you work through the backlog.

// 06 · lighthouse accessibility audits

Lighthouse Accessibility Audits

Lighthouse is built into Chrome DevTools and provides a combined audit covering performance, accessibility, best practices, and SEO. The accessibility category uses a subset of axe-core rules plus additional custom checks.

Running from Chrome DevTools

Open Chrome DevTools, navigate to the "Lighthouse" tab, check only "Accessibility" under categories, and click "Analyze page load." The report shows a score from 0-100, with each failing audit linked to a detailed explanation and the affected elements.

Running from the CLI

Terminal
# Install Lighthouse globally
npm install -g lighthouse

# Run an accessibility-only audit
lighthouse https://example.com \
  --only-categories=accessibility \
  --output=html \
  --output-path=./a11y-report.html \
  --chrome-flags="--headless"

# Output as JSON for programmatic processing
lighthouse https://example.com \
  --only-categories=accessibility \
  --output=json \
  --output-path=./a11y-report.json \
  --chrome-flags="--headless"

Understanding the Score

The Lighthouse accessibility score is a weighted average of the individual audits that pass or fail. Not all audits carry equal weight — audits that affect more users or cause more severe barriers are weighted higher. A score of 100 means all audits passed, but it does not mean your site is fully accessible. Lighthouse only tests what automated tools can detect.

  • 90-100 — Good. No critical automated violations detected. Manual testing still required.
  • 50-89 — Needs work. There are violations that automated tools can detect and that should be fixed.
  • 0-49 — Poor. Significant automated violations present. Likely indicates systemic accessibility issues.
A score of 100 does not mean accessible Lighthouse tests a subset of WCAG criteria using automated checks. It cannot evaluate keyboard operability, screen reader experience, or content quality. A perfect Lighthouse score is a starting point, not a finish line.

// 07 · building a testing strategy

Building a Testing Strategy

A complete automated accessibility testing strategy covers every stage of development. The following workflow integrates accessibility checks from the moment you write code to the moment it reaches production.

1. Linter: eslint-plugin-jsx-a11y

Catch accessibility issues as you type in your editor. This is the earliest possible feedback loop — violations are flagged before you even save the file.

.eslintrc.json
{
  "plugins": ["jsx-a11y"],
  "extends": ["plugin:jsx-a11y/recommended"],
  "rules": {
    "jsx-a11y/alt-text": "error",
    "jsx-a11y/anchor-has-content": "error",
    "jsx-a11y/click-events-have-key-events": "error",
    "jsx-a11y/no-static-element-interactions": "error",
    "jsx-a11y/label-has-associated-control": "error"
  }
}

2. Unit Tests: jest-axe

Test individual components for accessibility violations every time your test suite runs:

JavaScript (Jest)
const { render } = require('@testing-library/react');
const { axe, toHaveNoViolations } = require('jest-axe');

expect.extend(toHaveNoViolations);

describe('Button component', () => {
  test('renders without accessibility violations', async () => {
    const { container } = render(
      <Button onClick={handleClick}>Submit</Button>
    );
    expect(await axe(container)).toHaveNoViolations();
  });

  test('disabled state is accessible', async () => {
    const { container } = render(
      <Button disabled>Submit</Button>
    );
    expect(await axe(container)).toHaveNoViolations();
  });
});

3. E2E Tests: cypress-axe

Test full pages and user flows in a real browser, including dynamic content and state changes:

JavaScript (Cypress)
describe('Checkout flow accessibility', () => {
  beforeEach(() => {
    cy.visit('/checkout');
    cy.injectAxe();
  });

  it('cart page is accessible', () => {
    cy.checkA11y();
  });

  it('form validation errors are accessible', () => {
    cy.get('[data-testid="submit"]').click();
    cy.checkA11y();
  });

  it('success page is accessible', () => {
    cy.fillCheckoutForm();
    cy.get('[data-testid="submit"]').click();
    cy.url().should('include', '/success');
    cy.injectAxe();
    cy.checkA11y();
  });
});

4. CI Gate: Pa11y / Lighthouse

Run full-page scans against your deployed preview or build output in CI. Block merges when violations exceed your threshold. See the CI/CD Integration section for complete workflow examples.

5. Manual Review: Keyboard + Screen Reader

No automated tool replaces manual testing. For every feature or significant UI change, perform a keyboard walkthrough and screen reader check before release. See the Keyboard Testing Guide and Screen Reader Testing Guide for step-by-step procedures.

Automate what you can, then test what you must The first four layers run automatically and catch the low-hanging fruit. The fifth layer — manual testing — catches the issues that matter most. Invest time in both. Automated testing without manual review gives you false confidence. Manual review without automated testing lets trivial bugs slip through repeatedly.

// 08 · configuration and custom rules

Configuration and Custom Rules

As your project matures, you will need to configure your tools to reduce false positives, add project-specific rules, and set appropriate thresholds. axe-core provides a flexible configuration object that most tools accept.

axe-core Configuration Object

The configuration object lets you enable or disable specific rules, set the WCAG standard to test against, and exclude elements from scanning:

JavaScript
const axeConfig = {
  // Only run WCAG 2.1 Level AA rules
  runOnly: {
    type: 'tag',
    values: ['wcag2a', 'wcag2aa', 'wcag21aa']
  },

  rules: {
    // Disable a specific rule (use sparingly)
    'color-contrast': { enabled: false },

    // Enable a rule that is off by default
    'scrollable-region-focusable': { enabled: true }
  },

  // Exclude elements from scanning
  exclude: [
    // Third-party widgets you cannot control
    ['#third-party-chat-widget'],
    // Known issues tracked in backlog
    ['.legacy-component']
  ]
};

// Usage with axe-core directly
const results = await axe.run(document, axeConfig);

// Usage with jest-axe
expect(await axe(container, axeConfig)).toHaveNoViolations();

// Usage with cypress-axe
cy.checkA11y(null, axeConfig);

Disabling False Positives

Sometimes a rule flags an element that is accessible but uses a technique the tool does not recognize. When you disable a rule, always document why and create a tracking issue to revisit it. Disable at the narrowest scope possible — exclude the specific element rather than turning off the rule globally.

Setting Thresholds

For Pa11y, the threshold option in your config file controls how many violations are allowed before the test fails. A threshold of 0 means any violation fails the build. For a legacy project, you might start with a higher threshold and reduce it over time:

.pa11yci.json
{
  "defaults": {
    "standard": "WCAG2AA",
    "runners": ["axe"],
    "threshold": 5
  },
  "urls": [
    {
      "url": "http://localhost:3000/",
      "threshold": 0
    },
    {
      "url": "http://localhost:3000/legacy-page",
      "threshold": 10
    }
  ]
}
Track your threshold over time Record your violation count at regular intervals. A decreasing trend means your accessibility debt is shrinking. If the number creeps up, your CI threshold is too permissive and new violations are being introduced faster than old ones are being fixed.

Frequently asked questions

What is Pa11y and how does it work?

Pa11y is an open-source accessibility testing tool that runs from the command line. Under the hood it drives a headless browser (Puppeteer or Chrome via WebDriver) to load a page, then evaluates it against either the HTML_CodeSniffer ruleset or axe-core's rules — your choice. The output is a JSON, JUnit, or CSV report of failures with selectors and explanations. Pa11y's value over a browser extension is automation: you can run it across hundreds of URLs (via the pa11y-ci wrapper and a sitemap), gate pull requests in CI, and produce reports for compliance audits. It's free, well-maintained, and has near-equivalent rule coverage to axe-core when configured to use the axe-core engine.

How much of WCAG do automated accessibility tests actually catch?

Roughly 30% of all WCAG violations, according to repeated industry studies — Deque, WebAIM, and others have all landed in that 25-40% range. Automated tools are reliable at catching things with deterministic signals: missing alt attributes, low contrast ratios, broken ARIA references, duplicate IDs, missing labels, invalid HTML. They cannot reliably catch: whether alt text is accurate, whether focus order makes sense, whether ARIA labels match the visual UI, whether a keyboard trap exists in dynamic content, or whether a screen reader announces the right thing at the right time. Manual testing covers the other 70%.

What's the difference between Pa11y, axe, and Lighthouse?

axe-core is the rule engine — a JavaScript library that evaluates a DOM against WCAG. It's used inside the axe DevTools browser extension, the jest-axe test helper, Playwright/Cypress integrations, and Pa11y (optionally). Pa11y is a CLI wrapper that runs axe-core (or HTML_CodeSniffer) against URLs — its strength is batch testing and CI integration. Lighthouse is Chrome's built-in audit tool, which includes a subset of axe-core's rules alongside performance, SEO, and best-practices audits — useful for quick single-page health checks. Most teams use axe DevTools for in-browser debugging, Pa11y or axe-core in CI, and Lighthouse for performance + a11y overview reports.

Can automated accessibility testing replace manual testing?

No. Automated tools catch about 30% of WCAG violations. The remaining 70% — focus order, meaningful screen reader announcements, keyboard logic in dynamic widgets, time limits, drag-and-drop alternatives, plain language, cognitive load — require a human running a keyboard, a screen reader, and a stopwatch. The right mental model: automated tests are like spell-check (catches the obvious), manual tests are like copy-editing (catches the meaningful).

How do I add accessibility testing to GitHub Actions or CI?

Three common approaches: (1) Pa11y CI — install pa11y-ci as a dev dependency, point it at your sitemap or list of URLs, and run it as a step in your GitHub Action. (2) axe-core via Playwright or Cypress — use @axe-core/playwright or cypress-axe inside your existing end-to-end test suite. (3) jest-axe for component tests — runs axe against rendered components in unit tests, catching regressions at the component level. Most teams combine (2) or (3) (per-PR component-level catches) with (1) (per-release full-site sweep). Whichever you pick, the rule is: fail the build on new violations, but treat existing baseline violations as a separate backlog so you can adopt the gate without a flood of false alarms on day one.

Which automated accessibility tool should I start with?

For most teams: axe DevTools browser extension for day-to-day debugging while building, plus axe-core or Pa11y in CI for regression prevention. If you're solo or just learning, install the axe DevTools extension and run it on every page you touch — that alone catches the majority of automatable issues. Lighthouse is also fine as a zero-install option (it's already in Chrome DevTools). Only branch out to specialty tools (ARC Toolkit for visual ARIA inspection, WAVE for landmark overlays) once you have the basics running consistently.