Service Incident - 7th of February 2025 - Incorrect detection for multiple choice questions with a single alternative that have been crossed out – Ans

All times in the document are recorded in UTC+1 (CEST).

Summary
On January 23rd, 2025, the support team received a ticket requesting clarification on new written assignment instructions for single alternative multiple choice questions. The sender noted that these changes were not included in the release notes of January 5th, 2025, and reported an issue where crossed-out corrections were still being considered as selected answers, contrary to the new instructions.

On January 28th, 2025, the support team escalated the issue to the technical team. The technical team responded on January 28th, confirming that the new model does not account for crossed-out corrections and began investigating a fix.

On February 3rd, the sender requested further clarification. The support team responded that the issue was still under investigation. A hotfix was released on February 7th, which prevents automatic grading if more than one answer option has been detected on a multiple choice question with a single alternative, while improvements to the recognition model are being investigated further.

Lead-up
On January 23rd, 2025, the support team received a ticket reporting that the new written assignment instructions were not included in the release notes and that crossed-out corrections were still being considered as selected answers for multiple choice questions with one correct alternative.

Fault
On the release of January 5th, a new recognition model was released, improving the automatic recognition of multiple-choice answers. This new model identifies checkboxes in the following states:

0: Empty checkbox
1: Checkbox with a cross inside it
2: Fully coloured checkbox
3: Corrected checkbox (a cross outside a fully coloured checkbox)

Previously, if an answer only had a checkbox classified as state 1, that one was selected. If an answer had checkboxes classified in state 1 and 2, only the one in state 2 was selected. Checkboxes in state 3 were ignored, as they were considered crossed out. However, the recognition model incorrectly identifies state 3 as state 2.

Impact
The impacted results did not always have their answer correctly recognised, when the corrected answers were crossed out.

An impact analysis was performed in order to find the results that were affected.

The analysis was performed on the scans uploaded between the 5th of January 2025 and the 7th of February 2025 via the new scan page and scoped to answers where multiple states were recognised. The analysis was later manually filtered to only include results where responses were incorrectly identified and were not altered by a reviewer.

The investigation resulted in 58 affected assignments, with 208 impacted submissions. This represents ~0.02% of total multiple choice answers over that time period.

Detection
The findings of the user who sent the ticket were forwarded to the technical team for further investigation.

Response
On January 23rd, 2025, the support team responded to the ticket that they would verify the issue internally. The user requested an update on January 28th. The support team identified why this change was excluded from the release notes and updated the release notes on the same day. The support team forwarded the recognition issue to the technical team.

On January 29th, the technical team confirmed that the crossed out corrections were incorrectly identified as a selected answer and would investigate further. The support team informed the user that the crossed out corrections were indeed incorrectly recognised, but that further investigation was required. The user requested an update on February 3rd, and on February 5th, the support team informed the user that the technical team was investigating.

The incident was increased to a priority-1 on February 6th after realising the impact on student results.

Recovery
On February 7th, a hotfix was released, which prevents automatic grading if more than one answer option has been detected on a multiple choice question with a single alternative. This change is a safety precaution to prevent multiple choice questions from being graded incorrectly, while the technical team is continuing to investigate further improvements to the model.

Timeline
23rd of January, 2025

12:31 - Support receives a ticket indicating that new changes have not been communicated in the release notes and that the newly introduced crossing out of corrections is not working correctly.
14:04 - User responds to support with requested information.
14:40 - Support asks the user for links to the affected results.
17:18 - Support notifies sender that they will investigate this further.

24th of January, 2025

11:33 - Support investigates why the change was not communicated in the release notes.

28th of January, 2025

11:05 - Support updates the release notes with the newly introduced instructions for written assignments.
15:03 - Support forwards the issue to the technical team for further investigation.
16:28 - Technical team discovers that in the example of the user, the crossed out correction is recognised as a correction instead.
17:07 - User requests for an update of the issue.

29th of January, 2025

11:44 - Support updates the user that the release notes have been updated with the new written assignment instructions, while the technical team confirms the issue and continues investigating.
13:43 - User provides feedback about the updated release notes.
14:49 - Support confirms that changes are necessary and will apply the feedback to the release notes.

31st of January, 2025

15:03 - Support receives more tickets regarding incorrect recognition of crossed out corrections.

3rd of February, 2025

14:02 - User requests if there is more information about the incorrect recognition of crossed out corrections.

5th of February, 2025

09:07 - Support notifies the user that the technical team is still investigating possible solutions.
16:55 - Support notifies the user that the technical team is still looking into it and will provide more information the following day.

6th of February, 2025

11:50 - Support notifies the user that in the upcoming release a change will be implemented which will require manual grading for crossed out detections.
12:22 - User confirms that this change is well received and asks for any potential impact.
17:00 An engineer reviews the ticket, recognises its impact on student grades, and escalates it to a priority-1 issue. They begin developing a hotfix and assessing the extent of the impact.

7th of February, 2025

09:29 - Support notifies the user that the issue will be hotfixed on the 7th of February by requiring manual grading if more than 1 option is selected in a question and that we have started an impact analysis for any impacted assignments between the 5th of January and the 7th of February.
09:45 - Hotfix is deployed to the production environment, which prevents automatic grading if more than one answer option has been detected on a multiple choice question with a single alternative.
11:19 - Support notifies the user that the hotfix has been deployed to the production environment and that a separate e-mail will be sent to all administrators.
17:03 - Support sends email to all administrators informing them on the issue.

Reflection

Model Release and Testing

In the January 5th release, we introduced two new models: one for multiple response questions, which was thoroughly tested, and another for multiple choice questions, which used the same model but with different test data. While the multiple response model performed as expected, the multiple choice model was not tested sufficiently. Specifically, we failed to identify the miscategorisation of state 3 checkboxes, where crossed-out corrections were incorrectly recognised as selected answers. This oversight led to incorrect grading in some cases.

This incident highlights a gap in our testing approach. While we ensured the new model functioned correctly for multiple response questions, we did not give the same level of scrutiny to multiple choice questions, despite the fact that they were using the same model. This resulted in a critical failure in recognising state 3 answers properly.

Incident Identification and Response Time

Another key issue was the delay in identifying the problem as an actual incident. We received multiple user reports questioning why certain questions were not graded with the new model. The report of misclassification got overlooked by the amount of feedback on our new model, which led to a significant delay in escalating the matter. This was due to a human error, and not due to not following the procedure.

By the time we formally recognised the issue on February 6th it had already been affecting grading for almost five weeks. This prolonged response time meant that students and instructors experienced incorrect grading for longer than necessary.

Lessons Learned and Future Improvements

To prevent similar issues in the future, we will implement the following improvement:

Enhanced Testing Protocols
Before releasing any new recognition model, we must ensure it is rigorously tested across all question types it affects. This includes verifying that recognition works correctly for every state.

By implementing this change, we can ensure that our grading models are more reliable and that issues impacting students are prevented.

Version	Date	Information
v1.0	13-02-2025	Initial version

Related articles