← All apps 🔬 Lab Findings

Methodology

How StackConflict measures and grades Shopify app performance

The Problem We're Solving

Every Shopify app you install injects JavaScript into your storefront. That JavaScript runs on your customers' devices — not your MacBook. A 200 ms CPU cost on a desktop machine becomes 800 ms on a mid-range Android phone. StackConflict makes that cost visible before you install, not after your conversion rate drops.

Clean-Room Test Environment

Each app is measured in isolation. We use a Playwright-controlled browser connected to an owned Shopify development store with no other third-party apps installed. This eliminates interference: the number you see is the cost of that app alone.

Every scan runs three times. We strip the highest CPU outlier and record the median of the remaining two runs to reduce measurement noise.

Scans are conducted on a clean Chromium instance with a hardwired desktop profile. No browser extensions, no ad-blockers, no cached credentials. The same page is loaded before and after script injection so the delta is purely the app's contribution.

What We Measure

JS Payload (KB)

The total compressed size of the app's primary JavaScript bundle as delivered over the network, measured from the HAR archive. Smaller is faster — every extra kilobyte must be downloaded on every page load by every visitor.

V8 CPU Execution Cost (ms)

We measure Chrome's ScriptDuration via the Performance Timeline API — the actual V8 parse + compile + execute time for the app's JavaScript. This is the delta between a baseline scan (no app) and an app scan (app injected), so baseline overhead cancels out. The result represents exactly how much main-thread time the app consumes.

Cumulative Layout Shift (CLS)

Measured using the Layout Instability API. CLS captures how much the page visually shifts after initial render — a score above 0.1 is perceptible to users and penalised by Google's Core Web Vitals ranking.

Mobile CPU Estimate

We apply Google Lighthouse's 4× slowdown factor to approximate performance on a mid-range Android device (Moto G Power class). Desktop CPUs are roughly 4× faster than the median mobile device. The mobile estimate is shown as a secondary figure — it is not a direct measurement.


Grading Scale

Each app is graded independently across three dimensions. The composite grade is the worst of the three — a lightweight payload cannot compensate for high CPU cost.

Grade JS Payload CPU (desktop) CLS Risk level
A ≤ 30 KB ≤ 50 ms < 0.1 Lightweight — zero friction
B ≤ 150 KB ≤ 150 ms < 0.25 Acceptable — monitor on mobile
C ≤ 500 KB ≤ 500 ms < 0.5 Heavy — high conversion risk
D > 500 KB > 500 ms ≥ 0.5 Critical — avoid on mobile stores

Benchmarks Behind the Thresholds

Grade thresholds are derived from industry research:


Known Limitations


Disputing a Score

If you are an app developer and believe our measurement is inaccurate, please use the dispute form below. Include the app slug, a link to the specific version of the script you believe we tested, and any relevant technical context. We re-run disputed apps within 5 business days.

Submit a dispute →

Data Freshness

Apps are re-scanned periodically as part of our continuous monitoring pipeline. The "Last tested" date shown on each app page reflects when the current metrics were captured. Significant version changes to an app's JS bundle trigger a rescan automatically.