Review: English; Indo-European; Computational Linguistics: Hoffmann, Sand, Arndt-Lappe, Dillmann (2018)

EDITOR: Sebastian Hoffmann
EDITOR: Andrea Sand
EDITOR: Sabine Arndt-Lappe
EDITOR: Lisa Marie Dillmann
TITLE: Corpora and Lexis
SERIES TITLE: Language and Comupters
YEAR: 2018

REVIEWER: Elen Le Foll, Universität Osnabrück


The papers in this volume were first presented at the 36th ICAME conference which took place in Trier in May 2015. The title of the publication echoes the title of the conference: ''Words, Words, Words – Corpora and Lexis''. The editors, Sebastian Hoffmann, Andrea Sand, Sabine Arndt-Lappe and Lisa Marie Dillmann, provide a brief introduction highlighting the lexicographic and pedagogical implications of the paradigmatic and syntagmatic approaches corpus linguists usually take to researching lexis.

In the opening chapter, ''Modelling Lexical Structures in the Oxford English Dictionary'', Edmund Weiner, the deputy chief editor of the OED, retraces the development of the OED's structural information networks from the mid 1980s, when the original computerisation of the OED was planned, to the present day. He suggests approaches to developing a dictionary that truly illustrates the non-linear nature of lexis and the great number of interconnections between entries.

In the second chapter, Antoinette Renouf investigates the circumstances of word coinage in a large diachronic corpus of UK newspaper writing, where coinage is deemed ''to be a special case of neologism, distinct in that the act of creation itself is [the] focus'' (p. 40). To this end, she ran an automated corpus monitoring system to detect potential coinages and combined these results as well as those from previous studies to draw up a framework for a working classification of coinage types.

In ''Synonym Selection as a Strategy of Stress Class Avoidance'', Julia Schlüter and Gabriele Knappe investigate the influence of rhythm and stress on the selection of near-synonymous adjectives in English. Their results, drawn from diachronic data of written and spoken American English spanning almost two centuries, suggest that adjectives with equivalent meanings but different rhythmic shapes do not occur equally frequently in all syntactic functions.

The following three chapters in this volume focus on the discourse functions of specific words in English. Karin Aijmer explores ''Intensification with Very, Really and So in Selected Varieties of English''. She points to differences between the frequencies of these intensifiers across spoken varieties of English from the UK, the US, New Zealand and Singapore and investigates their common collocates, as well as their individual semantic and functional profiles.

In the following chapter, John Kirk explores ''The Pragmatics of Well as a Discourse Marker in Broadcast Discussions'' recorded in the Great Britain and Ireland. The paper aims to apply the methodology followed in Aijmer's (2003) study of pragmatic markers and apply it to a new corpus (SPICE-Ireland), as well as to re-analyse the data upon which the model was first developed (ICE-GB). The author concludes that ''there is nothing peculiarly Irish about discourse uses of 'well''' (p. 163). Crucially, this paper highlights the pitfalls associated with assigning well-defined pragmatic function categories to multi-functional discourse markers in natural corpus data.

Maïté Dupont's contribution to the volume is a ''A cross-register Study of Connectors of Contrast'' in parliamentary debates, newspaper editorials and academic writing. She applies the framework of systemic functional linguistics with its notions of Theme and Rheme to investigate adverbial connector placement, together with the ''the powerful methods and solid empirical basis afforded by corpus linguistics'' (p. 177). The results appear to show that there are lexically-primed connectors, whose placement patterns are stable across registers, and stylistically-primed connectors, which are frequently polyfunctional and whose position is very likely to be affected by register.
''Towards a Model of Co-collocation Analysis: Theory, Methodology and Preliminary Results'' by Moisés Almela and Pascual Cantos addresses the issue of inter-collocational dependency. They demonstrate that collocational associations may not be contained in the relationship between the node and the collocate, but rather between collocates themselves. Indeed, the authors argue that whilst strength of association has been the focus of much attention in corpus linguistics and is now well captured by many existing collocation statistics (e.g. t-score, z-score, MI, logDice, etc.), the mode of association – which they define as ''the configuration of relations between the internal structure of the collocation and the domains of lexical attraction that can be identified in a collocational window'' (p. 213) – has largely been ignored. Almela and Cantos thus introduce a new category, the co-collocate, and present a step-by-step methodology to extract these, illustrating the method and the kind of results it can yield with the lexeme 'consequences'.

The final two chapters focus on pedagogical applications of lexicogrammar research. Costas Gabrielatos explores ''The Lexicogrammar of BE Interested: Description and Pedagogy'' by cross-referencing the information provided by pedagogical materials (EFL grammars and dictionaries), with results drawn from a corpus of spoken and written L1 English (BNC) and the patterns found in English L2 learners' speech and writing (ICLE and LINDSEI). He reports striking differences in frequencies and patterns of use between L1 and L2 usage and concludes that the results point to a correlation between L2 use of 'BE interested' and its treatment in the pedagogical materials examined.

The volume closes with a paper by Yves Bestgen and Sylviane Granger investigating EFL learners' phraseological acquisition processes. In ''Tracking L2 Writers' Phraseological Development Using Collgrams: Evidence from a Longitudinal EFL Corpus'' they describe their rationale for using collgrams as their unit of phraseological measure and the methodology used to extract these. The authors compare the learner data collgrams to L1 data from the BNC, thereby revealing different patterns of progress across the learning process, and depending on the types of bigrams. They also compare their results, drawn from a longitudinal corpus, to those from a comparable pseudo-longitudinal design and report very similar trends.


The opening chapter provides valuable insights into a lexicographer's practical considerations in attempting to realise some of the potential of the lexico-grammatical structures uncovered by decades of corpus linguistics research. Weiner soberly lists the aspirations formulated in the mid-1980s that are yet to realised and does not shy away from proposing fundamental changes to the structure of the dictionary in order to develop the OED into a fully explorable digital archive. At the same time, he concludes his chapter with more realism than we are perhaps used to in academic writing – making it clear that the changes he suggests will need to come from the publishers themselves, since it is not a case of the market driving changes. The chapter also includes a number of full-colour exemplifications of the approaches the author suggests with example entries.

Renouf's chapter on word coinage is fascinating both in terms of its methodology and its results. Though some limits of the study are acknowledged in the closing remarks, it is somewhat surprising to see that register restriction is not mentioned as a possible limitation. Whilst the typology of coinage signalling the author arrives at will no doubt be highly valuable for future research on neologisms, the nature of the corpus used to derive it will inevitably bias it. Since the corpus queried contained texts from the Guardian and the Independent, it may be more accurate to conclude that this paper presents a typology of coinage signalling in UK newspaper writing. The paper also presents many examples for each type of coinage identified, many of which are quite entertaining and thus make the chapter a very pleasurable read.

Schülter and Knappe's paper on the effect of stress on synonym selection packs a great deal of detail with a number of highly informative graphs in a single chapter. In spite of the (acknowledged) different degrees of statistical significance, the results are compelling and very well-explained. It must be stressed [no pun intended!], however, that the chapter essentially presents a detailed analysis of four case studies: the synonym pairs 'rich–wealthy', 'glad–happy', 'shut–closed' and the triplet 'fast–quick–rapid. As a result, further studies involving other adjectives are necessary to rule out a main effect factor simply involving the idiosyncratic properties of the lexemes examined.

In the introduction to her chapter on the three intensifiers 'very', 'really' and 'so', Aijmer states that ''the research questions focused on in this study are both quantitative and qualitative'' (p. 107). However, the many tables reporting quantitative results do not make any mention of statistical testing, which makes it rather hard for the reader to draw any conclusions from these tables. Qualitatively, the author helpfully provides bullet point summaries of the most important conclusions from the analyses on each intensifier. Arguably the most innovative aspect of this study is its attempt to capture the different uses of intensifiers in several English varieties as exemplifications of Schneider's (2007) developmental stages in his model of postcolonial Englishes. Drawn on the basis of just three words, these parallels can only be very tentative for now, but this study certainly opens up interesting avenues for further research.

As in the preceding paper, Kirk's contribution to the volume provides detailed results in table form with both raw and relative frequencies, but eschews statistical test results. Nevertheless, this paper makes a perhaps unique contribution to advancing corpus linguistics as a discipline, since it is one of very few studies to attempt to verify (part of) a previous corpus-based study (Aijmer's analysis of 'well' [2013]) on the very same corpus (ICE-GB). Not only does Kirk find 127 instances of pragmaticalised 'well' whilst Aijmer finds 130, but more critically, the two authors arrive at radically different functional distributions of this same particle. Hence, whilst computer-generated frequency results may conveniently satisfy corpus linguists' endeavour for objectivity, Kirk shows that functional interpretations are rather less objective than we are often tempted to believe. He thus invites us all to rethink our analysis procedures, if we are to ensure to uphold Leech's (1992) three core principles of corpus linguistics: verification, replication and objectivity.

Dupont's paper further develops the Systemic Functional Linguistics framework by adding further categories within the Rheme and the results appear to show that these new distinctions are indeed constructive. The categories of placement of connectors are well explained and illustrated with plenty of salient examples from the corpora examined. However, the tabular results (e.g. Table 6.5-6.13) would be much easier to read if they were reported graphically. Since no shading is used to illustrate significant differences between percentages, it is rather difficult for the reader to discern either of the ''two main types of placement profiles'' which the author claims ''emerge from these tables'' (p. 199).

Almela and Cantos' chapter makes a compelling case for the introduction of co-collocates in corpus-based lexical research. Their methodology is well-explained and illustrated. It can be speculated that it may provide particularly valuable insights for lexicographic and pedagogical applications. The main caveat is acknowledged in the paper itself and concerns the size of the corpus required for such calculations. The method currently requires the use of mega-corpora (such as the enTenTen2013 queried for the study) which likely means that the method is sadly not currently applicable to specialised corpora or even general language corpora for languages other than English with considerably fewer online text contents available.

Gabrielatos' paper addresses at least two issues that go over and beyond ''The Lexicogrammar of BE Interested: Description and Pedagogy''. First, he presents a compelling framework for pedagogy-driven research which involves comparing the use of lexicogrammar in L1 and L2 corpora, as well as pedagogical materials. However, the reader may be surprised to discover that the pedagogical materials selected for this particular study are in fact reference works (e.g. English Grammar in Use, Collins COBUILD English Grammar and Cambridge Dictionary Online) which learners may (or may not) consult as part of their learning process. Gabrielatos argues that whilst learners may not actually consult these specific sources, ''they can be expected to be largely representative of the kind of input L2 learners receive'' (p. 249). Still, one wonders whether EFL textbooks may have been a more appropriate choice to capture the language use that learners are frequently exposed to in instructional settings. Second, with this thorough case study on 'BE interested', the author lends support to Halliday's conception of lexis and grammar as ''complementary perspectives'' (Halliday, 1991, p. 32) marking ''the notional ends of a lexicogrammatical continuum'' (p. 244).

The brief summary of the results of a study by Ädel and Erman (2012) in the literature review section of the final chapter is somewhat unclear. If the reader is not already familiar with the study, it is impossible to infer that the figures reported (130 lexical bundles in native texts and 60 in L2 texts) refer to the number of bundles uniquely found in only one of the two corpora studied. Nevertheless, Besten and Granger's contribution convincingly demonstrates the usefulness of collgrams in pedagogical applications and the study's methodology may well prove influential for future study designs. The results themselves are difficult to evaluate at group level. The authors acknowledge a number of limitations – the most important ones being that the longitudinal corpus used only has two measurement points (first and third year of study) and the difficulty of accounting for intrapersonal factors when reporting such group trends (this point is well illustrated in Fig. 9.2). The comparison of the results from this longitudinal study and a pseudo-longitudinal one is also a welcomed contribution to the field, especially since we are all acutely aware of the complex and oftentimes costly processes that longitudinal data collection usually entails.

In conclusion, corpus linguists can look forward to reading this fine selection of a top quality papers first presented at the 36th ICAME conference in Trier. Indeed, the volume provides more than the results of a few fascinating individual case studies using a range of corpus resources and state-of-the-art tools: it also explores methodological issues and proposes new procedures and measures. Moreover, ''Corpora and Lexis'' also contributes to the refinement and development of (new) theoretical concepts and features novel applications of corpus-based findings in lexicographic and pedagogical applications.


Elen Le Foll is an English Education lecturer and PhD candidate at Osnabrück University. Her research interests include learner phraseology, language learners' use of online resources, textbook English and teacher training. She also teaches conference interpreting (German-English) at the University of Applied Sciences in Cologne and works as a freelance conference interpreter.

