Identification by voice (1)

In the first of a two-part series on the challenges of voice ID, Dr Jeremy Robson and Dr Harriet Smith advise practitioners to treat evidence from a witness claiming to recognise someone by their voice with extreme caution

When Charles I was beheaded, his executioner wore a visor and false beard. A Captain Hulet was subsequently tried for regicide. The key witness against him was a solider called Gittens who, asked how he was certain of Hulet’s identity, replied: ‘By your voice.’ Hulet had a compelling alibi for the offence, being in prison when it occurred. After retiring for ‘longer than the normal time’ the jury convicted. The judge who presided had sufficient doubt about the verdict to commute the death sentence to one of imprisonment.

Voice identification has long been recognised as having the potential to be the determinative evidence in a criminal trial. With it comes the risk that it might be appear persuasive in circumstances when in fact it is not. While the risks of eyewitness evidence are well known to criminal practitioners, earwitness evidence is encountered much less frequently. As a result, the challenges of dealing with earwitness evidence are not as fully recognised by either investigators or advocates. There is case law which has reminded courts of the need to be as cautious with earwitness evidence as eyewitness evidence but unlike eyewitness evidence, there is a lack of clear guidance in the Codes of Practice to the Police and Criminal Evidence Act 1984 as to how such evidence should be recorded and tested.

Important considerations for practitioners

The circumstances in which a witness might hear rather than see a perpetrator are numerous; the perpetrator may be wearing a mask, they may be on the end of a telephone or be covertly recorded. There may be a recording of the voice which can be subjected to expert analysis, in other cases it will be necessary to assess the accuracy of the witness identification. The identity verification task an earwitness is required to perform could be matching the person they heard during a crime with someone they know, and who is retained in their memory (a recognition case) or with someone they had not encountered prior to the crime event and are subsequently asked to identify (an identification case). This may appear similar to the exercise eyewitnesses perform, however, recent psychological research highlights that although face and voice processing are in some ways similar, there are also significant differences. Memories for faces and voices are processed in distinct specialised areas of the brain. Further, it has been shown that memory for voices is more error-prone than memory for faces, and that auditory memories are more likely to be subject to interference and corruption from supervening events. Overall, evidence from the psychological literature points to faces being more reliable indicators of identity than voices. This is the case regardless of whether the person in question is familiar to the witness or not. One fact that does exist in common with eyewitness evidence is that the confidence a witness has in their identification does not necessarily indicate accuracy.

One practical difficulty with voice identification is the difficulty witnesses have in articulating a description of a voice. Consider a description given by an eyewitness – ‘the suspect was male, quite tall with dark hair.’ It is unlikely anyone would consider such a description helpful or satisfactory and a defence advocate would have little difficulty in establishing the distinctive features which were not mentioned. With a voice identification however a description of a voice which runs along the lines of ‘male, quite deep, a northern accent of some sort’ sounds more convincing. It is much more difficult to identify inconsistencies between the description which was originally given at the time of the offence and the voice of the speaker. The lower expectations of voice descriptions creates an impression of consistency when in fact the points of similarity are few.

Another important consideration for practitioners dealing with voice cases is the importance of focusing on the duration of speech exposure rather than the extent of an observation. In one case we encountered the witness had seen the perpetrator for 10 minutes without recognising him. It was only when he spoke six words that he was recognised. The case was summed up on the basis of a ‘10 minute view’ rather than a two second burst of speech.

Current guidance and authorities

The Court of Appeal has recognised the challenges of voice identification in a number of authorities. In Hersey it was recognised that a modified Turnbull direction was needed in cases regarding voice identification (although it didn’t specify what these modifications should be). Clear guidance has now been incorporated in the judicial directions in the Crown Court compendium.

When an eyewitness has seen a crime, the procedures for testing the ability of the witness to repeat that task are well documented in Code of Practice D to PACE. The video identification procedure is something all criminal practitioners will encounter on an almost daily basis. Similar procedures have been used in voice identification cases with a recording of a voice being placed in a ‘line up’ of other voices and such procedures have been approved by the Court of Appeal in Hersey. Guidance, prepared by Professor Francis Nolan and John McFarlane of the Metropolitan Police, has existed on the best practice in conducting such parades since 2003. In order to ensure that voice parades do not unduly draw attention to the suspect they are often time consuming and expensive to produce. Officers need a sufficient sample of speech (usually taken from the interview) where the defendant is not discussing the crime. Similar ‘foil’ samples have to be extracted from other records of interviews which are of a similar audio quality to ensure the suspect sample does not receive undue prominence. The compilation needs to be overseen by a forensic phonetician to ensure consistency. As a result of these hurdles, voice parades are seldom used by the police to support an identification and some forces have made the policy decision never to conduct them. Our research showed that 74% of criminal lawyers surveyed had never encountered a voice identification parade.

Project findings

The ‘Improving Voice Identification Procedures’ project, funded by the Economic and Social Research Council, has been examining how to improve understanding of earwitness behaviour in order to modify parade procedures and maximise earwitness accuracy. We have addressed various aspects of the procedure which had not been previously tested. For example, the current 2003 Guidelines recommend that earwitnesses should listen to 60-second samples of each of the 9 voices in a parade before making a decision. Constructing such long samples is time-consuming and makes it harder to find suitable foils, which risks delaying the identification. The sooner the witness’ memory is tested, the more accurate it is likely to be. Our findings reveal that the samples can be reduced to 15-seconds without a performance cost. We have also tested the effect of different types of pre-parade warnings. The 2003 Guidelines state that witnesses should be warned that the perpetrator may or may not be present. While they are not prescriptive about the wording, our results show that strongly worded versions of the warning can negatively affect earwitness performance, and risk inhibiting correct identifications.

Hopefully these findings will result in a streamlined procedure which can be deployed more easily and at an earlier point in the investigation enabling suspects to be identified or, more importantly excluded more effectively. Developments in technology may enable the processes to be streamlined further. Until they do, practitioners faced with a witness who claims to have confidently have recognised someone by their voice should treat the evidence with extreme caution.

The authors are co-investigators on the ‘Improving Voice Identification Procedures’ project funded by the ESRC (Ref. ES/S015965/1) which is a collaboration between the University of Cambridge, De Montfort University, Nottingham Trent University and the University of Oxford.

Identification by voice (2) by Dr Jeremy Robson and Dr Kirsty McDougall will deal with expert analysis of voice.

Dr Jeremy Robson is an Associate Professor of Law at De Montfort University and Academic Tenant at KCH Garden Square in Nottingham. He researches issues relating to language and law and criminal evidence.

Dr Harriet Smith is a Senior Lecturer in Psychology at Nottingham Trent University. Her research focuses on facial and vocal identity discrimination in forensic and security settings.