Gateways disconnected from The Things Stack Cloud in the nam1 cluster

Incident Report for The Things Industries

Postmortem

Summary

On March 23, 2026, during a scheduled maintenance window, the Gateway Server (GS) component in the NAM1 region was accidentally restarted, causing gateways to disconnect in a similar pattern to the March 3 incident.

The development team took the opportunity to roll out a fix that had been planned for the next maintenance window. The fix resolved the reconnection issue and gateways are now reconnecting within several minutes.

Impact

Some gateways got disconnected for some of the tenants in the NAM1 region following an accidental Gateway Server restart during a maintenance window.

Root Cause

The incident was triggered by an accidental restart of the Gateway Server component in NAM1 during a scheduled maintenance window, causing gateways to disconnect in the same pattern observed during the March 3 incident — where simultaneous reconnects under high server load led to premature connection drops due to insufficient timeout and cache configurations.

Resolution

The development team used the opportunity to roll out a fix ahead of its planned release date. The deployed fix resolved the reconnection bottleneck, and affected gateways are now reconnecting within several minutes. No customer action is required.

Prevention / Action items

Process improvements

Procedures around component restarts during maintenance windows will be reviewed to prevent accidental restarts of production-critical components such as the Gateway Server.

Infrastructure improvements — already applied

The fix rolled out during this incident addresses the reconnection performance issues identified in the March 3 post-mortem. Gateway Server instances in NAM1 are now able to handle mass reconnect scenarios significantly more efficiently, with reconnection times reduced to within several minutes.

Posted Mar 23, 2026 - 13:54 CET

Resolved

This incident has been resolved.
Posted Mar 23, 2026 - 12:40 CET

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Mar 23, 2026 - 12:10 CET

Identified

An update intended for other regions was inadvertently applied to the nam1 region outside of its scheduled maintenance window. As a result, some gateways in nam1 are experiencing connectivity issues.

We sincerely apologize for the disruption. Our team is actively investigating and has the situation under close monitoring. We will provide further updates as the investigation progresses.
Posted Mar 23, 2026 - 11:44 CET
This incident affected: The Things Stack Cloud (North America 1 (nam1)).