Incident Summary
All times in the document are recorded in UTC+1 (CEST).
On the 13th of February 2024, at 09:26, the support team received a ticket about differences in the grades of students with similar amounts of points between two similar assignments. The ticket regarded an assignment taken in December 2023 and a more recent assignment taken in February 2024. The issue initially seemed to be only visual, but after further investigation it was found that the discrepancy was caused by the application of the older, deprecated version of the guess score correction, rather than the current standard guess score correction. The main difference between these two is that the older guess score correction only included single response multiple choice questions, while the new guess score correction also includes different closed question types.
On the 22nd of February at 15:25, a hotfix was released to prevent the issue from occurring again. Moreover, an investigation was started to determine the impact.
Lead-up
The support team received a ticket on the 13th of February, at 09:26. In this ticket, the sender reported that the scoring differed in two near identical assignments, one taken in December 2023 and the other taken in February 2024. Participants who received similar points between the two assignments received a higher grade in the assignment taken in February 2024.
This initially did not lead to an immediate assumption that there was a problem, as the guess score correction can inherently lead to differences in grades with an identical amount of points.
Fault
During the investigation, it was discovered that this issue was introduced on January 7th, 2024. On this date, a change was released where improvements were made to the internal codebase, these improvements were aimed to improve the readability and usage of the points calculation. This change contained an oversight where the older version of the guess score correction would be applied for assignments that contained varying types of closed questions.
Impact
An impact analysis was performed by the technical team in order to find the results that were affected. This analysis has been scoped to assignments where the results have been updated between the 7th of January 2024 and the 22nd of February 2024, as these were the dates from when the change was released until the investigation started. The results from this analysis are as follows:
- A total of 117 assignments were found to be affected among 16 institutions.
The impacted results have their grades calculated with a lower guess correction, causing the impacted results to have a higher grade than they should have.
Detection
The findings of the user who sent the ticket initially required more clarification. The technical team immediately started an investigation after receiving more details about the issue. The technical team initially suspected that the points with guess correction were incorrectly displayed. However, on the 20th of February 2024 the technical team discovered that the amount of points with guess score correction were displayed correctly, but the incorrect amount of points was used when the grade was being calculated. This was only the case for assignments that contained differing types of closed-ended questions, specifically: fill-in, match, multiple choice (with multiple alternatives) and order questions.
Response
The support team forwarded the ticket to the technical team on the 14th of February, 2024, at 13:31. It was initially unclear what the issue was. The technical team asked for more clarification on the 14th of February 2024, at 14:13. The issue seemed to be that the amount of points with guess correction were displayed incorrectly.
After the discovery of the actual issue on February 20th 2024, the technical team immediately started developing a hotfix as well as preparations for an impact analysis.
Recovery
A hotfix was deployed to the production environment on 22nd of February at 15:25 to prevent the issue from occurring again. A choice was made to not recalculate the impacted grades automatically, but leave this decision up to the impacted customer.
Timeline
13th of February 2024
- 09:26 - Support team receives ticket from a customer regarding the difference in mark calculation of two assignments
- 11:49 - Support team responds to the ticket, stating that the technical team will have a look at the problem.
14th of February 2024
- 13:31 - Support team asks the technical team to take a look at the question asked by the customer.
- 13:38 - The technical team indicates that the question is not completely clear to them and the guess scores of two results are shared to indicate why the same amount of points might still cause a different grade.
16th of February 2024
- 09:42 - Support team contacts the customer again asking for clarification and providing the information shared by the technical team.
- 10:37 - The customer responds and attempts to clarify the confusion.
20th of February 2024
- 09:51 - Support team informs customer that the technical team is looking into the issue
- 10:52 - The technical team informs the support team they suspect the issue to only be visual.
- 16:00 - The technical team looks into the issue once again and notices that there is a problem that is not just the visual issue they thought it was
- 16:36 - Support team updates the customer that the issue is being looked into with higher priority and that the updates will be shared once they are available
- 21:24 - The technical team finds out what the problem is and notifies the support team.
21st of February 2024
- 10:36 - Support team informs the customer that there is indeed a problem with the guess score correction
- 16:32 - Technical team has developed an initial fix for the problem
22nd of February 2024
- 09:32 - Findings shared with the CSM team
- 15:25 - Hotfix has been deployed to the production environment
Reflection
The steps taken in the handling of this incident were followed in procedural order. The initial assumption that the inquiry may be related to the complexity of the guess score correction, rather than an actual issue, caused the report to be picked up with less urgency than may have been required. Additionally, the initial find that made the technical team believe that they had found the issue, when there was something else at play, also caused the investigation phase to take longer than otherwise needed. We have included several internal tests alongside the hotfix which will be run in our continuous integration process to ensure that future changes to the point calculation are immediately discovered.
Comments
0 comments
Please sign in to leave a comment.