Skip to content

Proposal: E2E Testing Workflow Changes

Goal

Learn about e2e test failures before they hit develop.


Context

Current Challenges

Our current e2e testing workflow runs tests nightly against the develop branch. This means failures are discovered after code has already been merged, creating a reactive rather than proactive approach to quality assurance.

Why Change Now?

Our team structure and release cadence are evolving:

  • Multiple teams - More than one team is now contributing to the codebase
  • Expanded QA involvement - More than two QAs are interacting with the codebase
  • Strict release schedule - Moving to a fixed 2-week release cycle

The Problem Scenario

With nightly e2e tests running only against develop:

  1. A developer merges code late in the day (or evening)
  2. The nightly e2e run executes overnight
  3. Tests fail due to the merged changes
  4. The failure is discovered the next morning
  5. If this happens close to release day, the team faces:
    • Rushed debugging under pressure
    • Potential release delays
    • Difficult decisions about reverting vs. fixing forward
    • Chaos and stress across teams

Worst case: Release day arrives, last night's e2e tests failed due to something merged the evening before, and the team scrambles to diagnose and fix issues while the release deadline looms.


Proposed Solutions

Option 1: Quarantine Branch

A staging branch sits between feature branches and develop. PRs merge into quarantine first, e2e tests run, then quarantine merges into develop.

Pros:

  • Develop stays protected from breaking changes
  • Clear separation between "pending validation" and "validated" code

Cons:

  • Moving target - quarantine constantly changes as PRs merge in
  • Overhead of managing merges from quarantine → develop
  • Merge conflicts multiply (feature → quarantine, then quarantine → develop)
  • Who owns the quarantine → develop merge? When does it happen?

Option 2: E2E Tests on PR (Label-Triggered)

Developers label their PR (e.g., e2e-required) to trigger e2e tests before merge is allowed.

Flow:

  1. Developer raises PR
  2. Developer adds label to trigger e2e suite
  3. E2e tests run against the PR
  4. PR can only merge once e2e passes

Pros:

  • Shift-left: failures caught before code reaches develop
  • Developer ownership - they see their own failures
  • No quarantine branch overhead

Cons:

  • Slow feedback loop (e2e suites take time)
  • Resource cost - running full e2e on every labelled PR
  • Tests run against PR branch, not the merged result (could still break on merge)

Variation - Quarantine overnight: After PR e2e passes, PR moves to a merge queue that batches overnight. If nightly run passes, batch merges to develop.


Option 3: Merge Queue with E2E Gate

Use GitHub's merge queue feature (or similar). PRs enter a queue, get rebased/merged together, e2e runs against the combined result, then merges to develop.

Flow:

  1. PR approved and ready to merge
  2. Developer adds to merge queue
  3. Queue batches PRs together
  4. E2e runs against the batch
  5. If pass → all PRs in batch merge to develop
  6. If fail → batch is broken down to isolate the culprit

Pros:

  • Tests the actual merged state, not just the PR in isolation
  • Batching reduces total e2e runs needed
  • Develop is always in a passing state
  • Built-in GitHub feature (if using merge queue)

Cons:

  • Slower path to develop (waiting for queue + e2e)
  • Batch failures affect multiple PRs - isolation debugging needed
  • Requires workflow/tooling changes

Option 4: Tiered Approach

Not all PRs need full e2e. Use a risk-based approach:

Change TypeGate
Docs, config, tests onlyNo e2e required
Low-risk (styling, copy)Smoke e2e only
Feature/bugfix touching core flowsFull e2e required
High-risk (auth, payments, job lifecycle)Full e2e + manual QA sign-off

Pros:

  • Faster for low-risk changes
  • Focused e2e resources on what matters
  • QA can prioritise attention

Cons:

  • Requires accurate labelling/classification
  • Risk of miscategorisation
  • More complex workflow rules

Given our constraints:

  • Full e2e suite: ~2 hours (too slow for PR-level)
  • PR volume: ~10/day
  • Capability: Partial e2e suites available

Proposed Flow

┌─────────────────┐
│   PR Created    │
└────────┬────────┘


┌─────────────────┐
│  Bot suggests   │  "Consider adding: e2e:auth, e2e:jobs"
│  e2e labels     │
└────────┬────────┘


┌─────────────────┐
│  Code review    │
│  + Dev adds     │
│  e2e labels     │
└────────┬────────┘


┌─────────────────┐
│ Partial e2e     │  Target: <30 mins
│ runs on PR      │  Must pass to merge
└────────┬────────┘

         ▼ (pass)
┌─────────────────┐
│ Merge to        │
│ develop         │
└────────┬────────┘


┌─────────────────┐
│ Nightly full    │  Full 2hr suite
│ e2e suite       │  Safety net for edge cases
└─────────────────┘

What This Gives You

  • Most issues caught before merge - Partial e2e runs on every PR
  • Fast feedback - Devs know within 30 mins, not next morning
  • Full coverage nightly - Edge cases caught overnight
  • Simple flow - No quarantine branch, no manual gates

Layer Breakdown

LayerWhenWhat RunsDurationPurpose
PROn label addedPartial suite (affected areas)<30 minsCatch issues before merge
NightlyOvernightFull suite~2 hrsSafety net for edge cases

E2E Suite Segmentation

We need to tag/organise e2e tests by area:

LabelTest ScopeEstimated Duration
e2e:smokeCritical happy paths only~15 mins
e2e:authLogin, logout, token refresh, onboarding~10 mins
e2e:jobsJob lifecycle, state transitions~20 mins
e2e:installationInstallation flows by product type~25 mins
e2e:repairRepair/callout/service flows~20 mins
e2e:paymentsInvoices, payments~10 mins
e2e:fullEverything (rarely used on PR)~2 hrs

Aviator Configuration

Aviator needs to be configured to:

  • Require partial e2e CI check to pass before queuing (when label present)

[TODO: Add specific Aviator config changes needed]

Label Suggestion Bot

A GitHub Action will comment on new PRs with suggested e2e labels based on changed files.

Example mapping:

File Path PatternSuggested Label
src/services/api/auth*e2e:auth
src/app/(app)/(tabs)/jobs/*e2e:jobs
src/components/installation/*e2e:installation
src/components/repair/*e2e:repair
src/services/api/invoice*e2e:payments
docs/* onlyNo e2e required
Any other code changee2e:smoke

Example bot comment:

E2E Label Suggestions

Based on the files changed in this PR, consider adding the following labels:

  • e2e:auth - changes detected in authentication code
  • e2e:jobs - changes detected in job-related components

Add labels manually via the sidebar, or ask a reviewer to add them.

Implementation Steps

Phase 1: Segment the E2E Suite

  • [ ] Audit existing e2e tests and categorise by area
  • [ ] Add tags/labels to test files or create separate test configs
  • [ ] Create CI workflows for each partial suite
  • [ ] Validate partial suites run correctly in isolation

Phase 2: Label Suggestion Bot

  • [ ] Create GitHub Action that runs on PR open/update
  • [ ] Define file path → label mapping
  • [ ] Bot posts comment with suggested labels
  • [ ] Test on a few PRs to validate suggestions

Phase 3: PR-Level E2E

  • [ ] Create GitHub labels (e2e:auth, e2e:jobs, etc.)
  • [ ] Update CI to trigger partial e2e based on label
  • [ ] Configure Aviator to require e2e pass before merge
  • [ ] Document labelling guidelines for developers

Phase 4: Rollout

  • [ ] Communicate changes to all teams
  • [ ] Run in advisory mode first (warn but don't block)
  • [ ] Switch to enforced mode after 1-2 sprints

Expected Benefits

  • Faster feedback - Devs know within 30 mins if their PR breaks e2e
  • Most issues caught before merge - Not discovered the next morning
  • Reduced release-day risk - Fewer surprises close to release
  • Clear ownership - Bot suggests labels, dev adds them, dev owns the outcome
  • Scalable - Works with multiple teams and higher PR volume

Risks and Mitigations

RiskMitigation
Devs forget to add labelsBot comments with suggestions on every PR
Partial suites miss issuesNightly full suite catches them; fix next morning
Test flakiness blocks PRsQuarantine flaky tests; track flake rate
Nightly fails on developPrioritise fix in the morning; doesn't block PRs

Open Questions

  • [ ] Who owns the e2e suite segmentation work?
  • [ ] What's the current test flake rate? Do we need to address flakiness first?
  • [ ] How do we handle PRs that touch multiple areas? (run multiple partial suites?)
  • [ ] Who gets notified if nightly fails?