On Mar 3, 2026 , it was reported that some gateways lost connection to the LNS for some of the tenants in NAM1 region. This was triggered by an AWS update procedure which has affected the Gateway Server component. Although the affected gateways reconnected eventually, it took longer than expected (8 hours for some tenants).
The incident was triggered by an AWS infrastructure event (task retirement), which caused several Gateway Server instances in NAM1 to undergo a rolling restart. As instances restarted one by one, gateways began disconnecting gradually. Since the restart was rolling rather than simultaneous, some gateways maintained their connection to instances that remained active throughout the event.
The root cause of the prolonged recovery, however, was a short connection timeout configured on some gateways. With a large number of gateways attempting to reconnect simultaneously, the Gateway Server was operating under unusually high load — and the short timeout was insufficient under these conditions, causing connections to close prematurely before they could be fully established. This cycle repeated until the restarted instances completed their post-restart operations — at which point server load normalised and Gateway Server caches became available, significantly speeding up the connection process for the remaining disconnected gateways until service was fully restored.
In short: the AWS infrastructure event triggered the affected gateways disconnects, but the timeout misconfiguration is what made the recovery take up to 8 hours.
There was no manual intervention to resolve this incident. The affected gateways reconnected automatically after the downtime.
A minimum 60-second timeout is necessary for reliable connection establishment under high server load conditions. We will be reaching out to affected tenant owners, recommending that the TC_TIMEOUT setting in their Basic Station configuration is set to at least the default value of 60s. This change will help prevent premature connection drops during periods of elevated reconnect activity.
Existing documentation will be improved to specifically address recommendations for longer TC_TIMEOUT setting of the Basic Station configuration.
Our Cloud infrastructure configuration will be improved to reduce and accommodate higher instance load post-restart.