Big data and language learning

Dunbar and me

Primates are animals that form stable, social groups and the size of these groups is thought to have a direct relation to the size of the neocortex of the species concerned. In the 1990s the British anthropologist Robin Dunbar argued that the size of the human neocortex should enable us to maintain cohesive social groups of around 150 members. This became known as ‘Dunbar’s number’.

To maintain this size, human groups would need a clear incentive to remain together and would need to devote a good proportion of their time to some form of social grooming. However, according to Dunbar [1] a common language obviates the need for regular physical intimacy and allows social groups to remain cohesive through such instruments as gossip, story-telling and so on.

Certain companies have discovered that social problems begin when more than 150 people are working in the same building. W.L. Gore and Associates famously designed all their buildings with a capacity for 150 employees.

What about the language teaching business? Does Dunbar’s number have any validity here? Some of the evidence suggests that it does:

How many Accredited Members does EAQUALS currently have? According to their website the answer is 141.

What about IALC? According to their website, they have 161 members, although that number may include some temporary summer centres.

Also, coincidentally or not, the International House World Organisation has had around 150 affiliate members for as long as anyone can remember. New affiliates join, others leave, but the total remains more or less the same.

At the latest IHWO conference in Catania, Italy, I suggested that we should try to disprove that we are being held back by the size of our neocortex and actively recruit enough new affiliates to push the net total up to 170. Of course if we succeed, we may be risking the social cohesion of the organisation. But perhaps the neocortex of IH affiliates will demonstrate its ability to cope.

If you would like more information on how to become an affiliate of IHWO you can click here:  https://ihworld.com/join-ih/

Or you can write to me at jonathanpdykes@gmail.com 6�


[1] Grooming, Gossip and the Evolution of Language – Harvard University Press, 1996

Ours is not to reason why …

According to Viktor Mayer-Schönberger and Kenneth Cukier, authors of a book called ‘Big Data’ (first published by John Murray in 2013) the standard scientific method which we have all taken to be sacrosanct for well over half a century, is fast being displaced by the analysis of data which is now available on a previously inconceivable scale.

The idea, in a nutshell, is this: while knowledge can still be advanced by researchers coming up with a theory which is subsequently tested in a verifiable way (the standard method), it can now advance much more rapidly (and much less expensively)  by looking for correlations in the mass of data that analysts now have access to. In other words, knowledge based on investigating ‘why’ such and such happens is being supplanted by knowledge based on ‘what’ happens, irrespective of the ‘why’.

An example from the book: by tracking 16 different data streams from premature babies (heart rate, respiration rate, blood pressure, etc.) computers are able to detect subtle changes that may indicate a problem, long before doctors or nurses become aware of it. The system relies not on causality, but on correlations. It tells what, not why. And it saves lives.

Another example, this time closer to home: Big Data has already transformed the translation business. By analysing the entire content of the Internet, Google has built a corpus of billions of sentences, which enables its computers to predict the probability that one word follows another with ever-increasing accuracy. By 2012 its dataset covered more than 60 languages and by using English as a bridge, it can even translate from Hindi into Catalan (for example).

Statisticians at Microsoft’s machine-translation unit apparently like to joke that the quality of their translations improves every time a linguist leaves the team.

What about the language teaching business?  How might the Big Data revolution impact our industry?

One obvious example: Big Data ought to be able to help our marketing teams identify where best to spend our hard earned cash.  Around one third of Amazon’s sales are now generated by its computer driven, personalised recommendations systems: ‘If you liked this, you may like ….’ Just imagine we could target all our promotion at those market sectors most likely to respond positively. There are almost certainly companies out there that could analyse data generated by search engines, online shopping, social networks and so on, and point us in the right direction. I don’t know if we could afford to hire this sort of expertise. But I’m not sure if we can afford to ignore it either.

What about the thorny issue of second language acquisition theory?  Is such a fiendishly complex subject susceptible to this sort of data analysis?  Would it be possible to devise a way of collecting enough data to suggest how language learners the world over could study more effectively?

The industry’s exam boards must have masses of data squirreled away, but even if they were prepared to share it, how useful would it be? Well, it could indicate where results are improving and where they’re going downhill.  It could also provide evidence on those student profiles that tend to be most successful. Come to think of it, data from exam boards might even help debunk some of the wilder claims put forward by our industry’s ‘miracle method’ operators.

But of course the danger of focusing on data from outcomes (i.e. exam results) is that we end up accidentally reinforcing the ‘teach for the test’ paradigm that already influences our classrooms to a more than healthy extent. Some form of all-encompassing continuous assessment would generate more useful data. This is something that a number of Web-based language schools already claim to offer, albeit on a small scale. Would it be possible to define and agree on a set of metrics that would enable us to measure progress in language learning accurately and continuously, on a very broad scale, without undermining the effectiveness and creativity of our teachers? It’s a big ask. But it’s an enticing idea.