Incident summary
All times in this document are recorded in UTC+2 (CEST). On the 12th of June, SURFconext updated their metadata causing the Single Sign-On login URLs that were cached in Ans to no longer be valid. Ans caches these URLs to prevent making unnecessary requests to SURFconext, which would otherwise increase the time it takes for a user to log in.
However, after SURFconext updated their metadata, Ans continued to use the (cached) URLs. This difference caused a mismatch and resulted in users not being able to login via SURFconext, until their cache had been cleared or a hotfix which clears the user's cookies was deployed.
Leadup
In the later hours of June 12th a change was introduced by SURFconext which updated the SURFconext metadata. More information about this change can be found here. We were aware of the change, but did not expect this change to have any impact as it is mentioned that this action would only add a new key. It is also mentioned that the already existing key would be supported until 13th of December 2023. However, the addition of this key changed the entirety of the metadata, which resulted in the cached metadata that Ans uses to no longer be valid.
Fault
The update by SURFconext unexpectedly caused values used for the login to change. Therefore users were unable to login via SSO with SURFconext.
Impact
It was not possible for some users to log in to Ans via SURFconext on June 13th between 00:00 and 12:01, unless the cache of the user had been cleared or they selected the option “Login differently” and reselected their school. Schools using their own SAML Single Sign On configuration were not affected.
Detection
The incident was detected when the support team received several tickets stating that logging in via Single Sign-On was not possible.
Response
The support team informed the technical team about the issue on June 13th at 08:40, after which at 08:45 the status page on https://status.ans.app was updated to report on the incident.
An investigation was opened by the technical team at 08:45. A hotfix was deployed at 12:01 which fixed the issue.
Between 08:45 and 12:01 the technical team was busy researching whether the planned hotfix for the SURFconext login had an impact on a previous issue which involved a DNS change. They have confirmed it was unrelated.
Recovery
- The technical team began investigating the issue by locating what caused the login to fail when using Single Sign-On with SURFconext.
- Once it had been identified that the issue persisted due to a change in SURFconext, the technical team looked into creating a fix allowing users to login without having to clear their cache.
Timeline
All times are in CEST
13th of June 2023
00:00–05:00 - SURFconext updated their metadata.
08:26 - Several tickets were sent to Ans support regarding this issue.
08:40 - The support team informed the technical team of the issue.
08:45 - The status page was updated to let users know there is an issue when logging in via SURFconext.
08:45 - Investigation started by the technical team.
09:44 - A potential hotfix was created.
09:50 - Impact investigation of the potential hotfix.
12:01 - Hotfix to allow users to log in was deployed.
12:02 - A new high priority task was created to ensure the same issue does not happen again.
Reflection
The update performed by SURFconext was deemed safe and was not expected to change any major components of the metadata. Preparations are being made to better verify the impact of changes made by SURFconext.
Additionally to the original fix to mitigate the problem, a new high priority task was created to ensure the issue does not happen again.
Comments
0 comments
Please sign in to leave a comment.