By Kenia Cris
Photo by Valentina Cugusi
‘No book can be a ladder‘, said Argentinian writer Jorge Luis Borges in his short story The Library of Babel (1941). But what is it to say about 5.2 million books together? Last Thursday, Google, in collaboration with a team of researchers from the Harvard Cultural Observatory, launched Google Books Ngram Viewer, the first real-time culturomics browser in the the world.
For centuries, researchers interested in tracking social, cultural, and linguistic trends had to peruse volumes one by one and they would never read so many books. The corpus represents 4 per cent of all published books within the time frame 1800 to 2000 and includes 500 billion words, 361 billion being English and the others from six languages (French, Spanish, German, Chinese, Russian, Hebrew) . It took the group 4 years to put the current dataset and analysis tools together. They intend to expand it to include not only more books but but also magazines, newspapers, blogs and even non-text-based products, such as artwork.
Dr. Erez Lieberman Aiden, a co-author of the study published on Journal Science last week and computational biologist at Harvard University said the tool is not exactly offering answers, it is more a question machine, a hypothesis-generating machine. And aren’t questions responsible for the the initial ‘itching’ which will result into systematic studies and consequently original findings? Claire Warwick, director of the Centre for Digital Humanities at University College London, asked about the validity of Culturomics said “in science, huge datasets which people have used super-computing on have led to some fascinating new discoveries that otherwise wouldn’t be possible.” Such statements make believe that this is really where that ladder to a deeper comprehension of sensible knowledge begins. Let’s wait and see what sort of things we’ll be presented with in the end.