Peeking between the lines

Wednesday, 21 December 2016

A BBC drama & the birth of forensic linguistics

If you've been watching the recent BBC drama, Rillington Place, you'll know a bit about the case of Timothy Evans who shared a house at 10 Rillington Place with serial murderer, John "Reg" Christie. He was wrongly convicted and hung for the murder of his wife and baby daughter in 1950, largely based on confessions made in a number of conflicting statements he gave to the police.

Evans' conviction was first questioned when Christie confessed to killing a string of women, including Evans' wife and was found guilty at a trial in 1953. An inquiry was opened into Evans' case but upheld his conviction. Over the following years, arguments over the case continued in the media and in 1966, a second inquiry found that Evans was probably guilty of the murder of his wife, but not of his daughter. As part of this second inquiry, the linguist, Jan Svartvik, looked at the statements made by Evans to the police and carried out a linguistic analysis of the language used.

Svartvik found clear discrepencies in the style of the language, especially within the later statements. Evans had initially handed himself in and admitted to killing his wife, but not his daughter. He later changed his story to blame Christie, but by this time, the police were confident they'd caught the killer. Further statements were made under pressure from police after the bodies had been found and details of how they had been killed were known, which significantly, didn't match the accounts Evans had already given; presumably because he didn't actually know those details. Svartvik noted that much of the language throughout the four statements was what you would expect of an illiterate, working-class young man with lots of idiomatic language and nonstandard (or as Svartvik termed it, substandard) usage:

She never said no more about it ...

I said 'I thought you was going to Brighton.'

However, key parts of the later statements used much more formal language:

She was incurring one debt after another and I could not stand it any longer so I strangled her with a piece of rope ...

He handed me the money which I counted in his presence.

Looking at the four statements now, the inconsistencies really jump out at you, as does the 'policespeak' in the later statements. For example, in the third statement, you find a classic example of 'policespeak' with the repeated use of pronoun + then + verb:

I then got up, lit the gas and put the kettle on.

I then poured myself out a cup of tea I had already made.

In normal speech, you much more typically find then + pronoun + verb - Then I got up...

Svartvik doesn't directly accuse the police of falsifying the statements, but the implication is clear. Perhaps more interestingly, from my perspective, he was the first to use the term 'forensic linguistics':

This sally into the relatively uncultivated field of "forensic linguistics" has been interesting for a number of reasons [...] it has provided the linguist with one of those rare opportunities of making a contribution that might be directly useful to society.

Couldn' t agree more, Jan.

Svartvik, J. (1968) The Evans Statements: a Case for Forensic Linguistics

Tuesday, 29 November 2016

On research ...

Over the past few weeks, I’ve been doing a Research Foundations module as part of my MA course which looks at different aspects of doing academic research in the field of linguistics. At the same time, in my ‘other life’ as an ELT freelancer, I’ve been trying to juggle a work project which has involved carrying out corpus research for an ELT publisher.

In recent years, working both as a ‘commercial’ researcher and being involved loosely in academia through my work in EAP, the contrast between the two research realities has often struck me, but being involved in both so directly in parallel, the overlaps and contrasts have been especially obvious.

To be clear, I’m not saying that either approach is somehow ‘better’ or, to use an academic term, ‘more valid’ than the other. Of course, they’re different because they’re different contexts and have different purposes, but I thought it might just be interesting to reflect on those differences.

I obviously can’t go into detail about the commercial project I’ve been working on, for reasons of confidentiality, but very broadly, it’s using the publisher’s corpus of learner English to research the linguistic features typical of different groups of learners to feed into some teaching materials.

1 Getting started

Academic: The first 5 weeks of the research module were taken up with preliminaries – formulating research questions, lots and lots of reading and critiquing others’ work – and it wasn’t until week 6 that we even got on to talking about collecting data. This reflects, perhaps, the long process of planning and formulating a research question that goes on in academic research – there’s plenty of thinking time.

Commercial: The corpus already exists, so the data’s already collected and the research question had been formulated by the in-house project editors and passed to me in the form of a brief. So after a brief flurry of emails to clarify a few details, I got stuck straight into the analysis.

2 The timeframe

Academic: Certainly weeks, if not months or in many cases, years.

Commercial: 5 days!

3 The data

Academic: One of the things that never ceases to amaze, and often frustrate, me is how little data a lot of academic research is based on. So much of the research into L2 vocabulary acquisition, for example, seems to be based on short-term studies involving a handful of Japanese university students, who aren’t really representative of anything other than maybe ... those Japanese university students. Yet the findings get extrapolated to all language learners everywhere! Okay, so that’s a bit of an exaggeration, but it’s not that wide of the mark. To be fair, that’s because it can actually be very difficult to gather large amounts of data for academic research, either due to lack of time or resources or access.

Commercial: Large publishers have the resources (both financial and infrastructure) to gather huge amounts of data which they then have the wherewithal to process and store in massive computerized corpora. The corpus I’ve been using for this project is a corpus of learner language collected from students sitting international English exams. It contains over 50 million words produced by over 220,000 students from 173 countries. Which I think makes it pretty representative … at least of the language students use in written English exams, which was exactly what I was interested here in because my research was to feed into designing exam practice materials.

4 The theoretical background

Academic: Academic researchers have to ‘situate’ their work within the existing research and theoretical frameworks in their area. This means they really have to know their stuff. They have to know what everyone else has said or done before them otherwise they risk criticism … academics love to pick holes!

Commercial: When I get a brief for a job like this, I don’t really have the time to go out and do lots of reading, I have to rely on what I already know. That partly comes down to practical experience both as a language teacher and as a corpus researcher. I’ve been involved in ELT for some 25 years and I’ve been doing corpus research for almost 20. Over that time, I’ve absorbed quite a bit about the field, some of it of the pure academic kind and a lot that’s been ‘filtered’ as part of articles, blogs, conference talks and the like. I do try and keep up with my field(s), but it’s certainly not systematic or comprehensive.

5 The methodological approach

Academic: In the academic world, a lot of thought goes into developing the methodology for a particular study. Will the methods be quantitative or qualitative, will it use questionnaires or interviews, how will the sample be selected, will the study attempt to replicate previous research, what ethical considerations are there? Like the theoretical background, the methodology will be picked over and critiqued ruthlessly, so it has to be watertight and justified at every stage.

Commercial: The research that I’m working on won’t be published in itself. It will go straight to the editors and the materials writers, and they’ll be more interested in the results than how I got to them. That’s not to say I don’t approach the task systematically, but my methods can be much more mixed and pragmatic. In fact, for this particular project, I came at it from both a quantitative and a qualitative angle and then combined the two drawing on my personal knowledge of language teaching, language learners and exam materials to come up with useable results. So from a quantitative perspective, the corpus tools can crunch the data and produce things like word frequency lists that indicate which particular vocabulary items the different groups of learners produce most often. But I also looked at a smaller number of sample texts in more detail to get a feel for discourse level features. And if I come across something potentially interesting in the sample (like, say, the use of a particular discourse marker), I then searched to see how it was used elsewhere in the data – was it a quirk of the individual student or was it a wider phenomenon?

6 The write-up

Academic: An academic research project is written up in a very formalized style either as a dissertation or a journal article or maybe as a chapter of a book. It has to follow particular conventions with an introduction, literature review, methods, results, discussion, references, etc.

Commercial: Thankfully, my report can be written on a ‘need-to-know’ basis. The people who are going to make use of it don’t want to plough through a load of waffle, they just want a clear explanation of the results. So I present the findings numbered and explained under headings with lots of examples and a bit of commentary. In this case, my report came out to just over 6,500 words – not far off an academic journal article (typically 7,000-8,000 words) but much looser in style and typed up as I went along with only minimal rereading and editing.

It’s difficult to say which approach to research I prefer – and that’s probably a topic for a whole post of its own! – but it’s certainly going to be an interesting challenge over the next couple of years to flit between the two. Hopefully, the overspill in both directions will be productive.

Monday, 14 November 2016

#linguisticlandscapes

Last week, I handed in my first forensic linguistics assignment. It was a critical review of a paper* about signs – as in street signs – which have some kind of legal force: no entry, private property, etc. It’s a bit of a left-field sort of paper which didn’t seem to bear much relation to anything we’ve studied so far, nor to anything that I’ve yet found in the forensic linguistics literature. Its only connection seemed to be that it’s about language and the law.

When I first read through the paper, I was a bit flummoxed about where to start. The main field it seemed to draw on was something called linguistic landscapes. So I had a flick through some of the linguistic landscapes literature it cited … and was really none the wiser! To be honest, it was pretty impenetrable stuff, full of vague, abstract concepts described in dense academic language ('geosemiotics' anyone?!) and never really seemed to get to the point. In my applied linguist’s brain, I kept asking “But what are the implications of all this theorizing? What does it actually mean to real people in the real world?” ... and judging by some of the comments in the margins, I wasn't the only one ...

I felt like I needed to get a handle on the topic in terms I could relate to. Then I remembered that I’d seen the phrase linguistic landscape before … Every now and then, photos of signs pop up on my Facebook feed, posted by various ELT colleagues who are part of a group call MULL, Map of the Urban Linguistic Landscape. I tracked down the group and couldn’t see much more in it than a collection of occasionally amusing signs. Then I spotted that the group had been set up by Damian Williams, a fellow ELT writer who I happen to have been in contact with recently. So I dropped him an email to ask what MULL was all about.

Meanwhile, in that way that when something comes to your attention, you start seeing it everywhere, I spotted #linguisticlandscapes on Twitter in a tweet about an ELT conference talk. So I searched for the hashtag and lo and behold a flurry of tweets came up, including a number from a conference, LL8, in Liverpool earlier this year. I clicked through to the programme and browsed through the titles and outlines for some of the sessions. Some of them looked really fascinating, but also incredibly diverse – here’s a taste of a few:

Comic murals in Brussels’ linguistic landscape
Beijing in Africa: a new dimension in the LL of Addis Ababa
It won’t be quiet: the inner and outer linguistic landscape of Rēzekne’s most famous kebab restaurant

I was sort of starting to get a feel for what the area was about, then I heard back from Damian. He very enthusiastically passed on links to the website that accompanies MULL, an article he’d written and a reading list. His intro on the MULL website really helped to bring together what I’d been reading and by that point, a lot of the same ideas (and the same names) were starting to recur.

This took me back to some of the academic references I’d started off with, but this time I was starting to get more of a handle on where they were coming from. I can’t say I’ve completely ‘got’ the whole linguistic landscapes thing yet – it seems to be a new area of research that’s currently saying “Oh look, this is interesting!” but doesn’t quite know where to go with it next. Thanks to a bit of social media rummaging though, I did at least manage to finish my essay and to include what are, hopefully, a few relevant points and references. I've also joined the MULL Facebook group, just out of curiosity ...

* The paper I had to review:
Mautner, G. (2012) Language, Space and the Law: A Study of Directive Signs The International Journal of Speech, Language and the Law 19 (2)

Wednesday, 5 October 2016

So, what is Forensic Linguistics?

For the past 18 years, when someone's asked me "What do you do?", I've generally replied that I'm a lexicographer and waited for the baffled look. To be honest, it's not completely true that I've been a full-time lexicographer for all of that time, but it's easier to explain than 'ELT materials writer' or 'corpus researcher'. "I write dictionaries" has generally more-or-less worked as a definition. Last week though, I made my first tentative steps along a potential new career path and I suspect it's going to be even more tricky to explain ...

I've just embarked on a two-year, part-time MA course in Forensic Linguistics at Cardiff University. It didn't get off to a terribly auspicious start as I sniffled my way through the first week with a rotten cold and a head full of cotton wool, but I think I picked up enough through the fug to attempt to explain in general terms what forensic linguistics is.

It's a relatively new field of study and so, to an extent, isn't fully defined yet, but I'll borrow a break-down from one of my first lectures to explain the key areas of study and research:

Legal Language: looking at issues around the incomprehensibility of written legal texts for a general audience. Why is it so difficult for ordinary people to understand the legal documents we all come across - contracts, wills, pension documents, etc. and what can be done to improve this?

Language in the Legal Process: exploring how language is used (and misused) in the legal process when people interact with the police, lawyers and the courts. How does the language of the legal process disadvantage some people and groups, how is it abused and how could it be improved?

Language as Evidence: this is the CSI/Silent Witness aspect of 'forensics' that most immediately springs to mind. It looks especially at questions of authorship - identifying who wrote a blackmail threat or threatening email or determining whether a confession or police statement was really made by the person accused or was falsified. It can also include identifying speakers from recordings of their voices - who phoned in a bomb warning or was speaking in the background of a 999 call?

I'm coming into the course with an open mind about which area I'm most interested in. I guess it was the idea of 'language as evidence' that first drew me in - and it's probably the 'sexiest' part of the field - but as I've spent the past 25 years honing my skills in writing and explaining ideas as clearly as possible (for foreign learners of English), I do wonder whether I could turn those skills towards making the language of the legal process more comprehensible and accessible. Could I help train police and other legal staff how to better communicate with non-native speakers of English, perhaps?

And then there's the whole area of corpus research ... many areas of linguistic research nowadays are turning to corpora (large collections of language data) to find answers to all kinds of questions and provide quantifiable evidence of usage. From what I've read so far, forensic linguistics is a field absolutely crying out for more reliable, quantifiable evidence, especially that will stand up in court. Having worked with corpora for almost 20 years now, this is certainly an area that's going to be piquing my interest.

Through this blog, I'm planning to reflect on my experiences as a student and to write about the ideas and research that jumps out at me as most interesting. I can't promise how regularly I'll manage to post as I'm attempting to juggle part-time study with continuing my other work in ELT (and my other blog), but let's see how it goes ...