According to Viktor Mayer-Schönberger and Kenneth Cukier, authors of a book called ‘Big Data’ (first published by John Murray in 2013) the standard scientific method which we have all taken to be sacrosanct for well over half a century, is fast being displaced by the analysis of data which is now available on a previously inconceivable scale.
The idea, in a nutshell, is this: while knowledge can still be advanced by researchers coming up with a theory which is subsequently tested in a verifiable way (the standard method), it can now advance much more rapidly (and much less expensively) by looking for correlations in the mass of data that analysts now have access to. In other words, knowledge based on investigating ‘why’ such and such happens is being supplanted by knowledge based on ‘what’ happens, irrespective of the ‘why’.
An example from the book: by tracking 16 different data streams from premature babies (heart rate, respiration rate, blood pressure, etc.) computers are able to detect subtle changes that may indicate a problem, long before doctors or nurses become aware of it. The system relies not on causality, but on correlations. It tells what, not why. And it saves lives.
Another example, this time closer to home: Big Data has already transformed the translation business. By analysing the entire content of the Internet, Google has built a corpus of billions of sentences, which enables its computers to predict the probability that one word follows another with ever-increasing accuracy. By 2012 its dataset covered more than 60 languages and by using English as a bridge, it can even translate from Hindi into Catalan (for example).
Statisticians at Microsoft’s machine-translation unit apparently like to joke that the quality of their translations improves every time a linguist leaves the team.
What about the language teaching business? How might the Big Data revolution impact our industry?
One obvious example: Big Data ought to be able to help our marketing teams identify where best to spend our hard earned cash. Around one third of Amazon’s sales are now generated by its computer driven, personalised recommendations systems: ‘If you liked this, you may like ….’ Just imagine we could target all our promotion at those market sectors most likely to respond positively. There are almost certainly companies out there that could analyse data generated by search engines, online shopping, social networks and so on, and point us in the right direction. I don’t know if we could afford to hire this sort of expertise. But I’m not sure if we can afford to ignore it either.
What about the thorny issue of second language acquisition theory? Is such a fiendishly complex subject susceptible to this sort of data analysis? Would it be possible to devise a way of collecting enough data to suggest how language learners the world over could study more effectively?
The industry’s exam boards must have masses of data squirreled away, but even if they were prepared to share it, how useful would it be? Well, it could indicate where results are improving and where they’re going downhill. It could also provide evidence on those student profiles that tend to be most successful. Come to think of it, data from exam boards might even help debunk some of the wilder claims put forward by our industry’s ‘miracle method’ operators.
But of course the danger of focusing on data from outcomes (i.e. exam results) is that we end up accidentally reinforcing the ‘teach for the test’ paradigm that already influences our classrooms to a more than healthy extent. Some form of all-encompassing continuous assessment would generate more useful data. This is something that a number of Web-based language schools already claim to offer, albeit on a small scale. Would it be possible to define and agree on a set of metrics that would enable us to measure progress in language learning accurately and continuously, on a very broad scale, without undermining the effectiveness and creativity of our teachers? It’s a big ask. But it’s an enticing idea.