Bold Checkout not loading / loading slowly
Affected components
Updates

Write-up published

Read it here

Resolved

Incident Root Cause Analysis
Incident: Bold Checkout Outage / Degradation

Date and time: Tuesday April 25 10:50 AM CT - 1:05 PM CT \(135 min\)

Summary: Customers using Bold Checkout during the outage window experienced a slow or intermittent availability of service during a cloud hosted routine maintenance event, once the issue was detected a reversion of the maintenance to the original configuration was completed but did not fully resolve the issue. Further investigation was performed and the issue was remediated internally.

 

Impact: During the incident, customers experienced an error or a failure to load Bold Checkout. Additionally some customers may have entered Bold Checkout successfully, but experienced slow / intermittent response times  when placing an order. These issues led to a lower volume of orders than normal.

Root Cause:  Initial investigation discovered an unexpected change in internal routing behavior which resulted in a large portion of network traffic within the Checkout network  to be routed incorrectly to internal downstream resources. Upon roll-back, auto scaling technology was enabled to allow scaling back up of services. However, the services did not recover correctly to allow continuity of service and caused the isolation of computing resources within the network. 

The post-incident investigation into this routing anomaly uncovered a legacy configuration which was unique to this area of Bold’s environment  and was not compatible with updated configurations applied during the maintenance event. Reversion was completed but in turn caused anomalous behavior in the auto scaling technology causing further disruption to services as it tried to return Checkout to normal operation.

Detection: This issue was detected by Bold employees in real-time while routine maintenance was being performed. We also received multiple alerts from our automated alerting / monitoring systems.

Resolution: Upon discovery of the network flow breakdown from Checkout to internal resources we halted and reverted all maintenance work being performed. The reversion of the maintenance failed in this case to fully restore service. Further investigation found that a portion of the traffic was still not being routed correctly and later was found that auto scaling of our services was behaving incorrectly due to the reversion. Once discovered a configuration change was made to quickly correct the services and Checkout services resumed normal operation.

Tue, Apr 25, 2023, 09:04 PM
2h earlier...

Resolved

After monitoring we have observed no further issue with loading Bold Checkout and we have noted that traffic has recovered to normal levels.

We are considering this issue resolved.

Tue, Apr 25, 2023, 06:54 PM
43m earlier...

Monitoring

An additional fix has been implemented and we are now seeing recovery on Bold Checkout.

We are continuing to actively monitor to ensure we are seeing a full restoration of services.

Tue, Apr 25, 2023, 06:11 PM
1h earlier...

Identified

We are continuing to observe errors on attempts to load Bold Checkout.

We are actively troubleshooting this issue and will provide further updates as they become available.

Tue, Apr 25, 2023, 04:57 PM
16m earlier...

Identified

A fix has been implemented and we are seeing Bold Checkout traffic being restored slowly.

We are continuing to investigate the cause and actively monitoring the traffic increase to ensure services are restored to full operation.

Tue, Apr 25, 2023, 04:40 PM
30m earlier...

Investigating

We are currently experiencing an issue with Bold Checkout not loading, or loading very slowly / intermittently.

We are currently investigating and will provide updates as they become available.

Tue, Apr 25, 2023, 04:09 PM