Incident Summary
All times in the document are recorded in UTC+2 (CEST).
On the 12th of October, 2023, at 10:27, the support team received a ticket inquiring about updating the script which generates variable values while participants were taking a test. In the same ticket, the sender reported that a number of participants results’ received incorrect values for the variables. This issue was forwarded to the technical team on October 18th, at 10:10. The technical team discovered several variable values that did not match with the output of the code editor script defined in the respective exercises of that assignment. The technical team released a hotfix on October 19th, at 13:50 to prevent the issue from occurring again and started investigating the impact.
Lead-up
The support team received a ticket on October 12th, at 10:27, in which the sender reported a discrepancy between expected and actual values generated with the code editor script in an assignment. A question they sent in with the same ticket also inquired about the modification of the code editor script of an exercise. The sender asked in the question whether new values would be generated for the variables for the participants already actively taking the test.
Fault
During the investigation, the technical team discovered that there was indeed a discrepancy between the variable values generated by the code editor script in the exercises shown to participants that started their test before and after the modification.
When the first participant starts the test, the exercises selected for the assignment are stored in the cache. This is done for performance reasons, as otherwise Ans would have to reload each of the exercises for every participant. When the script of the exercise was changed, the change was not reflected in the cached versions of the exercises. Already-participating students had their variables updated based on the new version of the script, while the variables of the students that started after the change would be generated based on the old version of the script. This could have affected the question contents and grading of the latter group.
Impact
An impact analysis was performed by the technical team in order to find the results that were affected with the issue where the values of the variables were incorrectly generated. This analysis was scoped to the 1st of September, 2023, as this is the most common start date of the academic year. The results from the analysis, as of November 24th, 2023, 14:30, are as follows:
- A total of ten assignments were found to be affected across six institutions.
- The total number of results affected:
- Submitted results: 342
- Results that have not yet been submitted: 150
The analysis took longer than initially forecasted, as there was a necessity for the technical team to carefully inspect the log data for each test containing variables by developing a custom script. Both the script and the findings required several layers of validation.
Detection
First, the variable values for the assignment mentioned in the ticket were investigated and confirmed to be affected by the issue.
Subsequently, in follow-up analysis with a broadened scope, the technical team discovered more instances in which the issue occurred.
Response
The support team forwarded the ticket to the technical team on 18th October 2023, at 10:10. The team immediately took to the investigation. After confirming the presence of an issue, resources were allocated to finding the cause, as well as impact analysis; both starting on the same day.
Recovery
On the 2nd of November, 2023, the cause was found along with the fact that a hotfix deployed on October 24th, 2023, had already solved this issue. The hotfix prevents this issue from happening again and Ans will provide a list of impacted assignments along with the results from these assignments that have been affected.
Due to the nature of variable value generation, it is impossible to determine whether the marks of these results were negatively or positively affected, as participants may have gotten values based on the previous script, but may still have provided the correct answer to the question.
An email has been sent to the administrators belonging to the affected institutions. If you did not receive an email, your institution and results are not affected.
Timeline
12th of October
- 10:27 - Support team receives ticket containing issue
13th of October
- 8:43 - Support requests more technical information from user
18th of October
- 7:52 - Support receives response from user
- 9:10 - Ticket forwarded to technical team
- 14:31 - Development of hotfix has started
24th of October
- 15:36 - Deployment of hotfix
1st of November
- 20:13 - Priority of issue is set to ‘high,’ as the scores in the submissions are potentially calculated incorrectly.
2nd of November
- 17:15 - Technical team discovers the issue is fixed (indirectly). Support team is notified.
- Technical team initiates impact analysis
15th of November
- 16:16 - Script for impact analysis complete
- Analysis findings shared amongst technical team for validation
1st of December
- Technical team validation of findings completed
- Findings shared with Support and CSM teams
Reflection
The steps taken in the handling of this incident were followed in procedural order. However, the total time this incident spent as an open issue was affected by the combination of the requirement to be thorough in the impact analysis and the inability to reproduce the issue at the start, as it had been indirectly fixed already.
Comments
0 comments
Please sign in to leave a comment.