The evaluation of the simulation was split into two parts:
The null hypothesis, H0 is "a way of casting the question"  that is to be evaluated and is usually "a statement of the status quo" . The null hypothesis can be rejected by exhibiting a counter-example  or "statistically significant evidence against" . The aim of this evaluation was to test the effect of the simulated visual impairments on the user's ability to complete tasks using a computer. The null hypothesis was: 'The simulated visual impairments have no effect on the use of a computer'.
An approach based on "Cooperative Evaluation"  was used to evaluate the simulation.
"Cooperative Evaluation is a procedure for obtaining data about problems experienced when working with a prototype for a software product, so that changes can be made to improve it" . It is an iterative process "designed to be used in the early stages of design when the specification is changing rapidly"  and involves a small group of target users completing tasks set by the designer.
Tasks for Cooperative Evaluation should be "designed to exercise the relevant part of the prototype"  and must be "representative of the work that the product will support" .
Users recruited to take part in the evaluation should be "as similar to the target user population as is practical" . The group should represent the whole range of the target user population. Users are observed completing the tasks and are allowed to make mistakes. Any unexpected behaviour is recorded. Users are encouraged to think out loud and to explain any difficulties that they are having. .
The Cooperative Evaluation method was adapted to evaluate the effect of the simulated visual impairments on using a computer instead of evaluating the user interface. The evaluation took place towards the end of the development lifecycle. It was carried out in the normal manner but involved a larger group of users, who were observed completing a set of tasks and asked to provide feedback.
The simulation was intended to be an educational tool to allow fully sighted people to gain a better understanding of visual impairments. Consequently, intended users included anyone with 'normal vision' who had access to a computer. Recruiting users to complete the evaluation was a time consuming process. Due to the time constraints of the project, the group of users were mostly students and did not include a complete range of ages and backgrounds.
The control for an evaluation is "the experiment in the original circumstances, or those with no change"  which in this case involved a group of users carrying out the evaluation tasks with 'normal vision' and no impairments applied to the display.
The tasks made use of the web browser described in Section 8.16 and the simple drawing program described in Section 8.11 to evaluate computer use with the simulated impairments. The tasks were designed to explore common actions when using these programs such as clicking hyperlinks, buttons and menus, viewing pictures, reading and writing text and drawing pictures. A small pilot group was used to check that the instructions and task sheet were understandable before starting the evaluation sessions. Screenshots of the web pages used in these tasks are shown in Appendix C. The final list of tasks is shown below:
1.1. Locate the "View Photo" hyperlink at the top of the page, and single-click it using the left mouse button.
2.1. At the right-hand side of the screen is a vertical scroll bar.
Click arrow at the bottom to scroll the page down.
3.1. Position the cursor over the photo and click the middle mouse button.
3.2. Using the left mouse button, select 'Set as Background' from this menu.
3.3. Scroll back to the top of the page and click the link to return to the main page.
4.1. Read the text in the white box, and then answer the following question.
Who said "I am just going outside and may be some time."?
5.1. Read the text in the black box, and then answer the following question.
At what time did Hillary and Tenzing reach the summit?
6.1. Click the "Large Text Page" hyperlink.
6.2. Read the text on the page, and then answer the following question.
What was the name of Edmund Hillary's first wife?
6.3. Click the link to return to the main page.
7.1. The map of Great Britain is divided into six coloured regions. Using the left mouse button, click the red region.
7.2. Click the link to return to the main page.
8.1. Click the "Text Editor" hyperlink.
8.2. Left-click in the large text box on the page.
8.3. Using the keyboard, enter your name or a short sentence such as "Hello World!".
8.4. Click the link to return to the main page.
9.1. Click the "Spreadsheet" hyperlink.
9.2. One of the cells contains the text "She Sells Sea Shells". Select the cell by single-clicking it using the left mouse button.
9.3. Click the Copy button on the toolbar.
9.4. Select another cell by clicking it and then click the Paste button.
10.1. The spreadsheet contains the following column of numbers:
10.2. Select the cell below the last number.
10.3. Click the AutoSum button on the toolbar
10.4. Press the Enter key.
11.1. Using the left mouse button, select the triangle tool by clicking the button on the toolbar at the top of the screen.
12.1. Draw a triangle by single-clicking in the main area of the screen where the three corners of the triangle are to be located.
13.1. Click the red button on the toolbar to select red as the new drawing colour.
13.2. Click the 'Fill' button on the toolbar.
13.3. Select the rectangle tool by clicking the button on the toolbar.
13.4. Draw a rectangle by single-clicking in the main area of the screen where two diagonal corners of the rectangle are to be located.
14.1. Select the text tool by clicking the 'ABC' button on the toolbar.
14.2. Select where the text is to be drawn by single-clicking in the main area of the screen.
14.3. Using the keyboard, write your name or a short sentence such as "Hello World!".
15.1. Using the shapes available and a selection of colours, draw a house or another picture of your choice.
As users progressed through the tasks they may have developed techniques to deal with any difficulties imposed by the impairment. To try and avoid this benefit of learning, each user was asked to complete the second exercise with a different impairment to the first exercise.
Due to the time constraints imposed on the project and the large number of users that would need to have been recruited to evaluate all of the impairments that were simulated, the evaluation focused on glaucoma, macular degeneration and the cloudy vision variation of cataracts. In order to "detect a difference"  between the control group and the simulated impairments, all three impairments were evaluated at the highest severity. Glaucoma affects the peripheral vision, while macular degeneration affects the central vision. To evaluate the movement of the centre of vision with these impairments, both were simulated with the 'track to cursor' mode enabled. The cloudy vision simulation of cataracts at the maximum severity covers the whole of the visual field but is more intense at the centre of vision. The 'track to cursor' mode was disabled to keep the centre of vision at the centre of the screen.
The results of each evaluation session could have been recorded in one or more of the following ways :
The aim of the evaluation was to measure the effect of the simulated impairments. Timing users while completing the tasks and recording how many errors they made would have been a good way to assess this, however it would also have had the effect of making users feel like they were being tested rather than being co-evaluators. Under these conditions, users may have been reluctant to 'waste' time giving feedback and comments when completing the tasks. Recording users' actions with a system log or a video camera would have avoided this 'test' environment, but would have required extra software or hardware.
Instead, users were asked to evaluate the difficulty of completing each task by filling in a matrix during the evaluation. Comments were recorded in writing either by the user or the evaluator.
The difficulty of completing each task was evaluated on a scale of on a scale of 0-9 with 0 = easy and 9 = impossible. The following "box-and-whisker"  graphs show the minimum score, maximum score, the lower and upper quartiles and the median for each of the tasks completed for the control group and the simulations of glaucoma, macular degeneration and cataracts:
Users were observed while completing the tasks. They were also asked to provide comments about the tasks and why they found them easy or difficult. These observations and comments were recorded anonymously and are summarised below:
The tasks for this exercise are listed in Section 10.2.6.
Task 1 was more difficult with the simulation of glaucoma than with the other impairments. This was possibly due to the difficulty of finding the hyperlink on the web page. Task 2 was slightly more difficult with the simulation of macular degeneration than glaucoma, which was possibly because the button was harder to click when the area below the cursor was obscured. The mean scores for tasks 1 and 2 with the simulation of cataracts were a lot lower than those for glaucoma and macular degeneration. This was possibly because both the hyperlink and the scroll bar button that the user was asked to click with the mouse were located in the corners of the screen, where the vision was less impaired.
This task was again perceived to be most difficult with the simulation of macular degeneration, which can probably be explained in the same way as for task 2. The simulation of cataracts made this task more difficult than with glaucoma. This was possibly because the menu was displayed in the centre of the screen where the vision was more impaired. Selecting an item from the menu also required reading the text on the menu which may have been more difficult with the simulation of cataracts than with the other impairments.
The aim of these three tasks was to explore the relative difficulty of reading text on a web page with each of the simulated impairments. Reading normal sized text (tasks 4 and 5) with the simulations of glaucoma and cataracts was perceived to be equally difficult, while for the simulation of macular degeneration it was only slightly harder than with 'normal vision'. This can be explained by the fact that users were able to move the black patch to one corner of the screen so that the text they were reading was no longer obscured. Increasing the font size (task 6) was found to make the text more difficult to read with the simulations of glaucoma and macular degeneration, but easier with the simulation of cataracts. This was possibly because for glaucoma and macular degeneration, less text was visible in the unobscured areas of the screen. For the simulation of cataracts the effect of the larger font corresponded to that of the 'normal vision' control group.
The mean scores for the task with the simulations of glaucoma and macular degeneration were lower than for tasks 1 and 2, which may be because less accuracy with the mouse was required to click the region of the map compared to clicking the hyperlink and scroll bar buttons. The mean score for the simulation of cataracts was higher for this task than for tasks 1 and 2, which may have been because the map was in the centre of the screen where the vision was more impaired. The one 'impossible' score given for this task with the simulation of cataracts was given by someone who "couldn't tell the difference between red and pinky-purple" when trying to decide which area of the map to click.
Writing text was found to be most difficult with the simulation of glaucoma and easiest with the simulation of cataracts. For these two simulations the mean scores for this task were lower than for those for reading text (tasks 4 - 6). This may have been because the user knew what they had typed and this made it easier to read what had been written, or because the task only required a short sentence to be typed compared to the much longer passage that the user was asked to read. With the simulation of macular degeneration, this task was perceived to be more difficult than the reading tasks. This may have been because users did not move the black patch out of the way as they did for tasks 4 and 5.
The high scores for these tasks reflect the fact that they were also found to be more difficult by the control group as they required some familiararity with the spreadsheet program MS Excel. Both tasks were perceived to be hardest with the simulation of macular degeneration, which can be explained in the same way as for Task 2. In contrast to tasks 1 and 2, the mean scores for completing the tasks with glaucoma and cataracts were almost equal. This was possibly because in order to decide which buttons to click in tasks 9 and 10, the user had to correctly interpret the more detailed icons on the buttons.
The tasks for this exercise are listed in Section 10.2.7.
The mean scores for the drawing tasks with the simulations of glaucoma and cataracts were much lower than those for the drawing tasks with macular degeneration. The drawing tasks required precise control over the position of the mouse cursor. A possible explanation is that with the simulation of glaucoma, the area of the display below the cursor was completely visible, making it easy to position the cursor when clicking buttons and drawing shapes. However this area was partially obscured in the simulation of cataracts, and completely obscured in the simulation of macular degeneration, which made the precise positioning of the cursor more difficult with cataracts and almost impossible with macular degeneration.
Writing text in the drawing program was perceived to be easiest with the simulation of glaucoma and hardest with the simulation of macular degeneration. This result was different from task 8 (writing text in the web browser) where glaucoma had the highest mean score. The mean scores for macular degeneration and cataracts were higher for this task than for task 8, which reflects the fact that this task was also found to be harder by the control group. This may be because users were generally more familiar with entering text into a web browser than in the drawing program. In the drawing program extra actions were required to first select the text tool and then click a location on the screen before any text could be typed.
Drawing a house was perceived to be easiest with the simulation of glaucoma and most difficult with the simulation of macular degeneration. With glaucoma, users drew a house that could fit within the bounds of the tunnel. This made the task easier and could explain the lower score. The mean score for this task with the simulation of cataracts was higher than that for tasks 11-14 and the mean score with the simulation of macular degeneration was the highest of all the tasks. This was because this task requires shapes to be more precisely positioned, which was harder when the area below the cursor was either partially or completely obscured. Mistakes were cancelled by clicking the 'undo' button, however this extra action added to the difficulty of the completing the task. The median score for macular degeneration was higher still and was close to 'impossible'. This was because the mean score was affected by one particularly low score which was given by someone who found all of the drawing tasks equally easy.
The pattern of scores for the simulations of glaucoma and macular degeneration were different from the 'normal vision' control group which suggests these simulations had a greater effect on certain tasks than others, particularly the reading and writing tasks (Tasks 4, 5, 6, 8 and 14). Possible explanations were discussed in Sections 10.5.1 and 10.5.2. The scores for the simulation of cataracts followed a similar pattern to the 'normal vision' control group suggesting that this simulation affected each of these tasks in a similar manner.
The mean scores for the drawing tasks with the simulations of glaucoma and cataracts were generally no greater than the scores for the web browser tasks which was the same as the 'normal vision' control group.
The mean scores for the drawing tasks with macular degeneration were generally higher than the scores for the web browser tasks, but were comparable to tasks 3, 9 and 10. A possible explanation is that when clicking hyperlinks and larger buttons as required by most of the web browser tasks users developed a technique where they would move the black patch to one corner of the screen to enable them to find the hyperlink or button and remember its position. They would then click the link or button by moving the mouse cursor and guessing when it was in the correct position. This technique did not work as well when more precise mouse clicks were required to complete the drawing tasks or click small buttons which were close together such as in MS Excel and several attempts were often required to click the desired location.
Table 10.1 shows the sample size (n), sample mean (m), and standard deviation (s) for each exercise:
|Exercise 1||Exercise 2||Overall|
The impairment that had the highest mean score in Exercise 1 was glaucoma. The impairment that had the highest mean score in Exercise 2 was macular degeneration, which was also the impairment that had the highest mean score overall.
However, this was not enough to prove that the simulation of these impairments reduced the ability to perform tasks using a computer. In order to either reject or accept the null hypothesis, the statistical significance and importance of the evaluation results were calculated from the t-distribution of the data in Table 10.1.
Samples taken from a standard normal distribution have means mi with a 't distribution' . The "area under a region of the curve is the probability that m lies in that region" . It is bound to the area that excludes the tails of the curve by -t0 and t0. At the 99.9% confidence level only a = 0.1% of the area under the t-distribution lies in the tails, which means there is only a 0.1% chance that the mean lies outside the bounds . The value of t0 is calculated in MS-Excel using the function TDIST(a, n-1) . The population mean (µ) can be calculated from the sample mean (m), standard deviation (s) and t0:
Table 10.2 shows the population means at the 99.9% confidence level:
|Exercise 1||Exercise 2||Overall|
|Control Group||1.1 ± 0.6||0.9 ± 0.8||1.0 ± 0.4|
|Glaucoma||3.8 ± 0.5||2.5 ± 0.7||3.3 ± 0.4|
|Macular Degeneration||3.1 ± 0.7||5.3 ± 0.8||3.9 ± 0.5|
|Cataracts||3.0 ± 0.8||3.0 ± 1.5||3.0 ± 0.7|
The (student) t test is used "to calculate the probability p that two sample means m1 and m2 are the same" . The t statistic t0 is calculated using Equation 10.2:
"[T]he probability that the means are the same…is the area under the t distribution between [-t0, t0]" . It is calculated in MS-Excel using: p = TDIST(t0, n1 + n2 - 2, 2) .
Cohen's d is a "measure of importance" . It is calculated using Equation 10.3:
The value of d is "the effect size index", d = 0.2 indicates "a small effect size"  where "the data's dispersion is bigger than the difference in the means" . A value of d = 0.5 indicates "a medium effect size" , while a value of d = 0.8 is "a large effect size"  that "may be worth getting excited about" .
Table 10.3 shows the statistical significance and importance of the simulations of glaucoma, macular degeneration and cataracts by comparing the mean scores for each of the simulated impairments to those for the 'normal vision' control group. In order to perform the calculations, the following assumptions were made:
|T Statistic (t0)||Significance||Importance|
|Exercise 2||6 0||0.000%||2.3|
The results rejected the null hypothesis from Section 10.1 at the 99.9% confidence level by showing that the significance criterion for each of the simulated impairments was < 0.1%. In other words, the probability of "mistakenly rejecting the null hypothesis"  was < 0.1%.
Each result was also shown to be statistically important. The effect size index values from Cohen's d test were all within the range d = 0.9 to d = 2.3, which is described by Cohen  as "a large effect size".
The simulation was intended to be an educational tool to allow fully sighted people to gain a better understanding of the symptoms of visual impairments and how someone with these impairments might experience using a computer. In order to evaluate how useful the simulation was as an educational tool, participating users were asked to rate the following questions against the criteria: Strongly Agree / Agree / Not Sure / Disagree / Strongly Disagree:
1. I was aware of the symptoms of the impairment before using the simulation.
2. The simulation helped me to understand the symptoms of the impairment.
3. The simulation helped me to understand how someone with this impairment might experience using a computer.
The responses are shown in Table 10.4:
|Strongly Agree||Agree||Not Sure||Disagree||Strongly Disagree|
These criteria were assigned the values: Strongly Agree = +2, Agree = +1, Not Sure = 0, Disagree = -1 and Strongly Disagree = -2. An average value was calculated for each question:
From the sample of people that participated in the evaluation, people were most aware of the symptoms of glaucoma (74% either agreed or strongly agreed with question 1) and least aware of the symptoms of macular degeneration (17% either agreed or strongly agreed with question 1).
All three simulations were perceived to be useful in understanding the symptoms of the impairment and how someone with a visual impairment might experience using a computer. The simulation of cataracts was found to be the most useful (100% either agreed or strongly agreed with question 2 and 91% either agreed or strongly agreed with question 3).
Cooperative evaluation was used to evaluate the simulations of glaucoma, macular degeneration and cataracts. A group of target users were asked to complete a set of tasks using a web browser and a simple drawing program with the simulated impairments applied to the displayed output. The simulation was found to be useful in understanding the symptoms of these impairments. When applied at a high severity, all three impairments were found to have a significant and important effect on the ability to complete tasks using a computer.
© Stephen Ratcliffe. 2005
Department of Computer Science
University of York