<!channel> On Saturday, July 30, we experienced a ...
# ory-network
f
<!channel> On Saturday, July 30, we experienced a multi-hour service outage for our Identity Management and Authentication services in Ory Cloud. The outage was caused by a misconfiguration that prevented automatic restarts of failing service instances. A separate issue with our alerting and paging system led to a delay in the response of our SRE team. This incident caused the Ory Cloud service to be unavailable for close to 8 hours. We appreciate that a service disruption of this duration is unacceptable for a production system. To ensure this does not happen again we’re working on multiple improvements to our monitoring and operations procedures for Ory Cloud. We sincerely apologize for any inconvenience caused by this incident and will be publishing a comprehensive post-mortem in the coming days. The post-mortem will include (organizational and technical) action items Ory SRE is going to take to prevent such failure scenarios in the future.
❤️ 8
🙏 9
m
Appreciate the transparency
🙌 2