Methodology
How StackConflict measures and grades Shopify app performance
The Problem We're Solving
Every Shopify app you install injects JavaScript into your storefront. That JavaScript runs on your customers' devices — not your MacBook. A 200 ms CPU cost on a desktop machine becomes 800 ms on a mid-range Android phone. StackConflict makes that cost visible before you install, not after your conversion rate drops.
Clean-Room Test Environment
Each app is measured in isolation. We use a Playwright-controlled browser connected to an owned Shopify development store with no other third-party apps installed. This eliminates interference: the number you see is the cost of that app alone.
Every scan runs three times. We strip the highest CPU outlier and record the median of the remaining two runs to reduce measurement noise.
What We Measure
JS Payload (KB)
The total compressed size of the app's primary JavaScript bundle as delivered over the network, measured from the HAR archive. Smaller is faster — every extra kilobyte must be downloaded on every page load by every visitor.
V8 CPU Execution Cost (ms)
We measure Chrome's ScriptDuration via the Performance Timeline API — the actual V8 parse + compile + execute time for the app's JavaScript. This is the delta between a baseline scan (no app) and an app scan (app injected), so baseline overhead cancels out. The result represents exactly how much main-thread time the app consumes.
Cumulative Layout Shift (CLS)
Measured using the Layout Instability API. CLS captures how much the page visually shifts after initial render — a score above 0.1 is perceptible to users and penalised by Google's Core Web Vitals ranking.
Mobile CPU Estimate
We apply Google Lighthouse's 4× slowdown factor to approximate performance on a mid-range Android device (Moto G Power class). Desktop CPUs are roughly 4× faster than the median mobile device. The mobile estimate is shown as a secondary figure — it is not a direct measurement.
Grading Scale
Each app is graded independently across three dimensions. The composite grade is the worst of the three — a lightweight payload cannot compensate for high CPU cost.
| Grade | JS Payload | CPU (desktop) | CLS | Risk level |
|---|---|---|---|---|
| A | ≤ 30 KB | ≤ 50 ms | < 0.1 | Lightweight — zero friction |
| B | ≤ 150 KB | ≤ 150 ms | < 0.25 | Acceptable — monitor on mobile |
| C | ≤ 500 KB | ≤ 500 ms | < 0.5 | Heavy — high conversion risk |
| D | > 500 KB | > 500 ms | ≥ 0.5 | Critical — avoid on mobile stores |
Benchmarks Behind the Thresholds
Grade thresholds are derived from industry research:
- Google/Deloitte (2019): A 100 ms improvement in mobile load time correlates with a 1% increase in retail conversion rates.
- Google Core Web Vitals: CLS < 0.1 is "Good", 0.1–0.25 is "Needs Improvement", > 0.25 is "Poor". We use stricter thresholds for A/B because apps stack.
- HTTP Archive data: The median Shopify storefront has 300–500 KB of third-party JS. A Grade D app alone can exceed that budget.
Known Limitations
- Script injection, not real install. We inject the app's production CDN script directly. Some apps behave differently when fully configured (e.g., personalisation widgets that require a real customer session). CPU cost may be slightly understated for apps with server-side personalisation.
- Checkout Extensibility not measured. Apps that run exclusively in Shopify's checkout sandbox (via Web Pixels or Checkout Extensions) operate in an isolated iframe with different performance characteristics. We flag these apps with a 🛡️ badge but do not claim to measure their checkout-phase cost.
- Single-store baseline. All tests run against the same development store. Themes with unusually heavy or light baseline JavaScript may affect the delta measurement marginally.
- Snapshot in time. Scores reflect the app version deployed at the time of the last scan. Apps ship updates frequently — check the "Last tested" date on each app page.
Disputing a Score
If you are an app developer and believe our measurement is inaccurate, please use the dispute form below. Include the app slug, a link to the specific version of the script you believe we tested, and any relevant technical context. We re-run disputed apps within 5 business days.
Submit a dispute →Data Freshness
Apps are re-scanned periodically as part of our continuous monitoring pipeline. The "Last tested" date shown on each app page reflects when the current metrics were captured. Significant version changes to an app's JS bundle trigger a rescan automatically.