The Pragmatic Usability Rating by Experts (PURE) methodology is a usability evaluation method that quantifies how difficult it is to use a product in a manner that is quick, cheap, reliable, and valid. It was used to conduct an ease-of-use assessment between Minitab 19 and JMP 16 by the UX Design Team within the Research and Design department. During my internship with Minitab, I got to lead the first PURE assessment run by the team.
A healthcare research analyst with a master’s degree and a fair knowledge of statistical methodology, has been exposed to a variety of data analysis tools and applications, but is not necessarily proficient in a particular one.
My team and I convened to produce a list of tasks that would be executed in both applications. The following use cases were written as follows:
Upon compiling the tasks together, the next step was to determine the methods to complete each task and find the desired path, formally known as the happy path, to execute in both applications. Some steps were easier to determine than others. Next, I was going to record myself in Microsoft Teams running through the use cases and the happy paths for the team to review so that they could be rated. This step was simple as I was just following the task list while being mindful of how my audio would sound. Once I finished the recordings, we were ready to begin rating the tasks.
As we progressed through the study, it became easier to determine what should be considered a step (e.g., input and response). Having finished rating the tasks, I summed the task scores together for both applications to get their total PURE scores. I immediately started creating a report in UXPin to visualize the final results.
I faced a few challenges during the preparation and performance of this assessment. One challenge I faced was incorrectly identifying the happy paths for a few of the tasks in JMP, and then having to re-record said tasks. This was due to discoverability, different wording compared to Minitab, and needing to spend more time using the software to understand how tasks are performed.
An oversight on our part when going through some of the tasks was that we did not initially have a persona created for this study. This was no longer an issue as we created our persona after rating the first two tasks in Minitab.
Minitab received a total score of 42, and JMP received a total score of 48. The tasks were rated on a scale of 1 to 3. A score of 1 indicates that the step or task can be considered easy to accomplish. A score of 2 indicates that the task can be completed albeit with some effort, and a notable degree of cognitive load, and a score of 3 indicates that the task is exceedingly difficult for the target user to complete, and that they could fail or give up at a step.
The color of the total PURE score is determined by the worst rating, as shown in the report. As we were testing the ease of use, our rationale for rating each step was based on how easy it would be for our target user to accomplish. Steps that were rated as a 1 did not require much discussion, although those that were rated either a 2 or 3 had some back-and-forth discussion about our rationale before we reached consensus on what to rate the step.
I found the scores for the 3rd and 7th tasks notable as they are respectively the most different and similar. For Minitab, the 3rd task took only three steps to check the distribution, while JMP’s took four steps, and two of those steps were rated a 2. Though discoverability was not necessarily a metric for this study, hiding the options to see the fitted distribution within the context menus seems to have affected its overall score. The 7th task had the highest scores in the study that seem to indicate steps where the user can face difficulties. The third step for Minitab was rated a 3 due to having to scroll if multiple points are brushed, while the fourth step in JMP received the same rating due to the cognitive load required for finding the context menu that allows a user to redo an analysis.
The overall scores were close. A few of the top tasks that would seem straightforward such as opening a file, or exporting a file had low scores, while some of the use cases involving the analytical tools had higher scores. Though Minitab had a better score than JMP overall, most of the task scores were in yellow, which can make a case that some of the software’s top tasks require some cognitive load for an experienced statistician. Thanks to the success of this research study, the Research and Design department will continue to use this methodology for UX benchmarking. The team is currently planning to carry out additional PURE assessments on other use cases and features of Minitab to compare them with a few other competitors.