Is Anthropomorphic Design a Viable Way of Enhancing Interface Usability?

 

Chapter Four: ‘Computer as Tool’ vs. ‘Computer as Friend’ - an Empirical Study into the Use of a ‘Celebrity’ Interface.

After considering the arguments concerning anthropomorphism in interface design along with its general lack of commercial success - and questioning users on their habits and views on actual and fictional interface models, it was decided that the best way to further investigate this issue would be to conduct some user tests. Previous studies into the effects of humanlike design have tended to employ computer-based games in order to test users. In this research a word processor was used, as the focus of interest was on the everyday application of anthropomorphism, rather than establishing the existence of interaction. A word processor was deemed a more appropriate representation of a ‘real world’ context.

 

4.1  Method

 

4.1.1 Design and Manipulation

A simple word processing application called ‘Lush Writer’ was created using Visual Basic. This program was used as the basis for two different versions – Lush Writer A (humanlike) and Lush Writer B (machinelike), which were implemented in the user tests.  The two versions were identical in the way that they executed tasks, but differed in the way that they interacted with users. Both versions contained inbuilt functionality which recorded each user’s name, which interface elements they had clicked on and when – and how long it had taken for them to perform the test.

The applications contained a simple range of functions that you would expect to see in a normal word processor. There was nothing ‘unusual’ about the graphical design - standard sets of icons were used and layout conventions were observed. Each program contained an integrated help system which was written in html.

4.1.1.1 Lush Writer Version B – Computer as Tool

Lush Writer B employed a ‘computer as tool’ paradigm. Whilst this version contained no anthropomorphic characteristics, great care was taken not to make it seem hostile, or to overplay the ‘machinelike’ condition.

In a similar test, using humanlike and machinelike conditions (De Laere, Lundgren & Howe, 1998) The authors wrote the message text of the machinelike condition in capitals – this may have placed too much emphasis on the ‘machine’ and affected the results, as the use of capitals in text-based interactions is generally considered to be ‘shouting’ and could be regarded as hostile by users. Steps were taken during the design of Version B, in an attempt to avoid potentially negative influences such as these, which could unintentionally distort the results.

The message text was written in a fluent  style (Brennan and Ohaeri 1994) and addressed users as ‘you.’ (see Fig. 4.1) Version B was essentially designed for use as a ‘control’ version, to be studied in comparison with Version A.

fluent message style

Figure 4.1 Version B Uses a Fluent Message Style.

4.1.1.2 Lush Writer Version A – Computer as Friend

Lush Writer A used an anthropomorphic design which applied a ‘computer as friend’ paradigm with a ‘celebrity personality’.

After examining data from the questionnaire and other literature, it was determined that the best way to apply anthropomorphism in this instance would be through text-based manipulation. This was based on the observation that users tend to interact with their computer holistically, as a ‘person.’ When interacting with a character on the screen, such as Clippy, they might feel as though they are dealing with a ‘friend within the friend’ (which could be considered an unnatural form of social interaction), rather than the ‘computer as friend’ (natural interaction).

This variation in natural or unnatural interaction may explain why some research (Nass, Moon, Fogg, Reeves & Dryer, 1995, Nass,1998, Isbister & Nass, 2000) has shown that users apply different social rules to on-screen characters than they do to characters which are implied by text based manipulations.

The decision to use text was considered sufficient in order to stimulate the desired level of interaction – as mentioned in Chapter One, it has been well documented that only minimal cues are needed in order to create a computer ‘personality’ (Nass, Moon, Fogg, Reeves & Dryer, 1995). There was no need for artificial intelligence or natural language capabilities. The design also observed the use of consistency (Isbister & Nass, 2000).

Results from the web-based questionnaire (Chapter 3) were used to select the ‘celebrity personality’ which would be exhibited by Version A. Although Bob Marley came top in the popularity scoring system, it was decided that his personality would be harder to implement as the message text would have to be written in a Jamaican ‘accent’ which was considered more likely to irritate or possibly even offend users.

Ultimately, Elvis Presley was used instead. This was because it was considered that he would be easier to characterise and more widely recognised by users.

Elvis’ personality was represented through the style of language used in the message text, the tool text and the help system.

anthropomorphic message

Figure 4.2 – Version A Uses an Anthropomorphic Message Style.

The messages used the first person pronoun (‘I’ and ‘my’) – (defined as anthropomorphic by Brennan & Ohaeri,
1994
). (Figure 4.2) They were intended to be ‘chatty’ – this was expressed through the language style of the Lush Writer system messages and the use of 3 seemingly random messages, timed to appear during the course of the interaction (Fig 4.3)

random message

Figure 4.3 – A ‘random’ message and the response when the user clicks ‘yes’

 

Acknowledging that other humanlike interfaces may have failed because they are too prominent, attempts were made in this prototype to ensure that the methods of interaction were more subtle and potentially less distracting or irritating to users. To balance out concerns that the system might subsequently be too subtle and that users might ‘miss the point’ a ‘label’ (Nass, Reeves & Leshner, 1996) was used - the system introduced itself as ‘Elvis’ at the beginning of the session.

4.1.2 Participants

24 participants (13 female, 11 male) were involved in the research experiment. Tests took place both in the PC labs at UWE and on personal computers in the homes of several of the participants. The test subjects were recruited through the web-based questionnaire, which had a tick box for those wishing to participate in further research. Friends and family were also roped in to help. The majority of the test subjects were students or young professionals. The average age of test candidates was 29.

4.1.3 Procedure

The participants were divided into 2 groups, with evenly matched computer abilities.

The user test was comprised of two sections. The first section was intended to compare the usability of the two systems; the second section was designed to gauge user reaction to each version. Participants followed a set of paper-based instructions and were not informed as to the intention of the experiment, so that they would not be prejudiced in their judgment.

The first section of the test involved both groups completing the same two tasks (editing a letter and typing in and editing a passage of text). One group used Version A and one group used Version B. Both versions recorded the user’s actions in a text file (stats.txt), so that they could be compared in order to measure whether one version of Lush Writer was more usable than the other.

The second section of the test required users to switch versions (ie those who had been using Version A, now used Version B, and vice versa) in order to complete a further two tasks. The tasks were different to the ones used in section one, but were matched in type and complexity. A SMOG readability formula was applied to ensure that both passages of text were of the same difficulty level.

The tests were designed to be reasonably difficult. The passages of text contained Latin and Japanese phrases written in italics, in order to exercise the user’s dexterity. Both sections together were intended to take an average user about half an hour to complete.

4.1.3.1 Measures

In addition to being recorded by the system, users were also asked to fill in a paper-based evaluation on completion of each section. Likert scales were used to establish which system users found easier to use and how they felt about using the programs. Users were also invited to write comments on each system.

 

4.2 Results

4.2.1 User Test Section One – Usability and Productivity

The productivity and usability measures were based solely on data gathered from the text files (stats.txt), which were produced by the system during Section One of the test. The twelve files generated by users of version A were directly compared to those of Version B. Three factors were applied in order to measure usability: speed (how long it took for users to complete Section One, measured in seconds), proficiency (how many interface elements were clicked on during the test) and accuracy (how many errors were made when users typed in the passage of text from Question 2).

4.2.1.1 Speed


The average time taken to complete Section One of the test was 1065 seconds (approx. 18 minutes). The longest time that any participant took to complete the section was 1974 seconds (approx. 33 minutes); the shortest was 762 seconds (approx. 13 minutes).

When compared, it was found that most participants completed the test more quickly using Version B (the non-anthropomorphic interface) than Version A.

The average time taken to complete Version A was 1125 seconds (approx. 19 minutes), the average time taken to complete section B was 1005 seconds (approx. 17 minutes).

4.2.1.2 Proficiency


In addition to measuring the users’ speed in seconds, the program also recorded which interface elements (such as toolbar icons) they had clicked on and when.

When progressing through Section One users were required to perform a set amount of actions, which required a minimum of 12 elements to be selected in sequence. If a user was having difficulties using the program they may have clicked on an item more than once, or selected several irrelevant elements, whilst looking for the right item. In order to indicate a user’s proficiency at using the system, the amount of element selections made by each user in Section One was counted and evaluated in relation to the number of minimum clicks that they should have made (the greater the number of clicks = the less proficient the user). The statistics gathered from users of Version A were then compared with those of Version B to see if users were more proficient at using one system than the other.

14 participants used only 12 clicks to complete section one, which was the minimum amount of clicks required to finish the task. These users could be considered very proficient at using the system. The greatest amount of clicks made was 21.

The average amount of clicks made by users of Version A was 13. The average amount of clicks by users of Version B was 15. So, in this test, users of Version A were more proficient than those of Version B.

4.2.1.3 Accuracy


Question 2 of Section One required users to enter a passage of text and save it. These saved versions were later checked in Microsoft Word for spelling and punctuation errors. The amount of errors generated by the users of each version were then compared to see if one version prompted more accurate responses than another.

The greatest amount of errors by any participant in Section One was 13.  4 users made no errors.

The average amount of errors made by users of Version A was 6. The average amount of errors made by users of Version B was 4. This indicates that Version B may induce more accurate input than Version A.

User preference was recorded on two paper-based evaluation forms – one was filled in on completion of Section One, the other on completion of Section Two. A five point likert scale was employed to compare their views on ease of use, task difficulty and enjoyment.

4.2.2 User Test Section Two - User Preference

4.2.2.1 Ease of Use 

71% of participants stated that they found Version A (the anthropomorphic version) either easy or very easy to use, compared to 58% with Version B. The remainder of the users stated that they found the system neither easy nor difficult to use, no users found either system difficult to use.

4.2.2.2 Task Difficulty

54% of Users of Version A found the task easy or very easy to complete, compared to 33% of users of Version B.

4.2.2.3 Enjoyment of Program

63% of participants reported that they either enjoyed, or really enjoyed, using Lush Writer Version A - compared to 17% of users of Version B. However Version A also elicited a greater negative response, with 25% of users stating that they either disliked or really disliked using the program.

It was noted that participants who used Version A after using Version B regarded it more highly than those who had used it first.

Figure 4.4 shows a breakdown of user enjoyment of Version A:

enjoyment of version a

Figure 4.4 – User Enjoyment of Version A (Anthropomorphic)

 

Users of Version B were more indifferent in their perception of the system, with 70% stating that they neither enjoyed nor disliked it. (see Fig. 4.5)

 
 

enjoyment of version b

Figure 4.5 – User Enjoyment of Version B (Non-Anthropomorphic)

 
 

4.2.2.4 Favourite System

When asked to state which was their favourite version of Lush Writer, 75% of users chose Version A.

4.2.2.5 The Help System


Despite a question in each section being specifically designed to make participants consult the help system (for example in Section One, Question 1, users are asked to correct any spelling errors in the text, but there is no spell checking function – it is assumed that they will consult the help file in order to find this out), most users chose to ‘muddle through’ and only 4 people confirmed that they had used the help system.

4.2.2.6 Is it Wrong to Humanise?


When asked, 88% of users said that they did not think that it was wrong to give human characteristics to a computer.

 
 

4.3 Discussion

 
 

4.3.1 Methodological Concerns

As with the questionnaire, the scale of this research was very small and it was not intended to be a definitive study, merely an exploration of the issues raised in the previous three chapters. The participants were mainly students or young professionals and are not considered representative of a cross section of society.

One concern outlined in Chapter Three is that anthropomorphic characters may seem cute at first, then silly and ultimately distracting (Shneiderman & Plaisant, 2004: 487). Although the test was designed to be relatively long and monotonous, a more effective way to study the ultimate effects of humanlike characters would be to investigate how they are received in the long term. Unfortunately, it was not feasible to do so for the purposes of this report.

4.3.2 Was Version B More Usable?


The results of the usability analysis showed that Version  B was marginally more usable than A. However this cannot be taken as an indication that humanlike interfaces are less usable than machinelike ones. The difference between the results was so small that it could not be considered conclusive evidence.

Where Version B was found to be faster to use than Version A, this may have been because ‘Elvis’ presented more messages, which would have taken longer to read and respond to and could also have distracted participants from the task.

These findings correspond with those of previous studies. To date, little research into the effects of anthropomorphism has been able to prove that it has any enhancing effects on usability. If anything, it has been proven to divert from, rather than increase productivity.

4.3.3 Why Did Users Prefer Version A?


75% of participants favoured the anthropomorphic Version A.

Of the participants who had used Version B first, 83% expressed a preference for Version A. This may suggest that after enduring the monotony of the first section, they were pleased to have a little light relief in the form of ‘Elvis.’

The fact that Version B was a ‘vanilla’ interface with no frills may also have influenced this decision, as Version A was more memorable.

It should also be noted that Version A was more maligned – with 25% of users stating that they did not like it. It tended to provoke a stronger response from users than Version B whose lack of significant characteristics caused 70% of users to assert that they neither liked nor disliked it.

4.3.4 The Computer As Friend Paradigm and the Celebrity Interface


This research explored the use of the ‘whole computer’ as the anthropomorphic character, rather than a character on the screen. Although the usability results are fairly inconclusive. Version A gave a strong indication that on first impression, users enjoyed using this system of interaction.

Positive comments made by users included that they would like to be able to talk back to the system, they would like a range of characters to choose from and that they would like to be able to turn it off easily if it became annoying.

Although the idea of the celebrity interface seemed obvious, it was rather hard to implement. (I am not an Elvis fan and would have preferred to ‘do’ Mr T or Ozzy Osbourne). Trying to think of relevant ‘catchphrases’ and writing a help system in a southern twang proved quite difficult.

The current trends in customisation or personalisation of items such as computers, cars and mobile phones imply that if the ‘celebrity interface’ was implemented appropriately it could be successful. Formal methods would need to be developed in order to model the personality of a celebrity. If a living celebrity was used then it would probably be better to involve them during the design stage to create an authentic range of responses. Legal rights to the use of name etc would also need to be established. A huge assortment of phrases and words would have to be incorporated so that the user did not receive the same responses over and over again. A selection of celebrities (including ‘none’ – for the 25% who did not want to hang out with anybody), would be necessary in order to cater for the diversity of user’s personal taste.

Is it harder for users to dislike a character they already know? Microsoft’s endeavour to create a ‘new’ character, which would be successful in it’s own right (like the Californian Raisins mentioned in Chapter Two), was probably a poor judgment. If they had used Bambi or Mickey Mouse, yes they would have to pay some royalties, but they may have been more successful. If users saw someone they ‘knew’ on the screen it might make them more confident in their interaction.

If the celebrity interface was implemented effectively, with considered methods of interaction and accurate displays of personality, could it really be an effective method for calming users nerves and mitigating frustration?

 
 

< chapter three

chapter five >

   
 
 

© Alison Flind 2006