Observability in e-commerce. How to survive Black Friday in 2026?

TL;DR - Quick summary
- No guesswork: basic monitoring says something broke - observability explains why checkout failed, often down to a code path.
- Microservices: distributed tracing ties a storefront click to a PostgreSQL query.
- Peaks: GMI headless stacks (MedusaJS v2, NestJS) ship with telemetry so Black Friday does not silently burn revenue.
- Proof: SFD at 100,000+ downloads and 4.9★ through major traffic spikes without ops panic.
The problem: CPU at 20%, checkout still returns 500
Friday 7pm, flagship Black Friday launch. Meta and Google ads burn tens of thousands. Traffic jumps 800%, conversion collapses. AWS consoles show low CPU, free Redis, a “healthy” database - yet buyers tweet they cannot pay.
That is an expensive incident for a commerce lead. Grepping gigabytes of flat logs for a payment-gateway timeout burns hours - abandoned carts rarely return.
GMI Software (Gdańsk, 16+ years) lives by one rule: if you cannot see sub-second behaviour under the hood, you do not control the platform. Bigger boxes are not the cure - observability is.
Monitoring vs observability in 2026
Two terms executives often conflate:
- Classic monitoring (Zabbix, Pingdom): “is the host up?”, “is the disk full?” - fine for monoliths (Magento). Outcome: you know something failed, not why.
- Observability (Datadog, AWS X-Ray, Sentry): inspects logs, metrics and traces from inside the app. Outcome: you see the path UI → Medusa → Stripe → 3.4s stuck on an ERP call with a bad API key.
How we trace checkout failures (GMI stack)
Requests hop across order, loyalty and payment services - without a shared thread you lose days:
1. Distributed tracing
Mobile clients (React Native) attach a correlation id that follows every downstream hop.
2. NestJS on Node.js
NestJS services instrument cleanly - millisecond-level spans without polluting domain code.
3. MedusaJS v2 plus APM
The commerce core shows where the funnel slows. APM surfaces graphs where the bottleneck is a coupon microservice, not the cart service.
What does peak-ready commerce cost?
Observability is not a bolt-on plugin - we embed it during design.
- Budget: headless MedusaJS, NestJS and Next.js with full telemetry on AWS typically PLN 160,000-240,000; large B2B ecosystems often exceed PLN 250,000.
- React Native (Expo): plugging into the same observable API usually saves roughly 30-40% versus dual native apps.
We tame budget fear with DDT (Discovery, Design & Technology). GMI Software is the only Polish software house pairing that workshop with a fixed-price guarantee on resilient architecture - no hidden fees, no lock-in.
Frequently asked questions
- How does monitoring differ from observability?
- Monitoring watches external signals (CPU, RAM, HTTP codes) to answer “is the system up?”. Observability uses logs, metrics and distributed traces to answer “why and where did it break?”.
- Why are plain logs not enough for microservices debugging?
- Monolith logs read chronologically. Ten services can emit thousands of lines in the same second. Without distributed tracing and correlation IDs, tying a payment error to a cart error is nearly impossible.
- What is APM (Application Performance Monitoring)?
- Software such as Datadog, New Relic or AWS X-Ray that instruments a running app. It surfaces live latency spans, PostgreSQL query timings and infrastructure bottlenecks.
- How does GMI Software protect systems during Black Friday?
- We run MedusaJS v2 and NestJS on AWS ECS clusters. Event-driven side effects mean a secondary failure (e.g. SMTP) should not kill checkout payments.
- How long does the DDT process take at GMI Software?
- Discovery, Design & Technology usually lasts three to five weeks. We review current pain points, UX/UI mockups and architecture for traffic spikes. The output is a binding quote with a fixed-price guarantee.
Content updated: March 31, 2026