Big Data: A Deeper Understanding of the Way Things Work

Over the last twenty five years the world has fundamentally changed. We have been swept up in the digital revolution, the world is connected like never before, and the pace of technological advancement shows no sign of stopping its rapacious advancement.

If the Atomic Age had nuclear warheads and mutually assured destruction as its eponymous by-product, the Information Age has server farms and data. Lots of data. So much of it in fact, that most people have lost the ability to comprehend its volume. Estimates put the world’s current digital data storage capacity at 2.7 zettabytes. Or just shy of 169 billion 16gb iPhones. The numbers are so ridiculous that some among us (myself included) have begun to use the buzz phrase “big data”, in what feels to me like a futile attempt to singularise the ballooning volumes of information.

The value of digital data is very real; Google is now the second most valuable company in the world, with a current market cap of $440bn, a value which has been reached by the phenomenal success of its advertising product. Ninety percent of Google’s revenue in 2014 was earned from companies who paid to ensure that your beady little eyes flitted past their product; your subconscious unknowingly fostering familiarity with their brand as you ostensibly browsed cat pictures. Google’s advertising is delivered through an array of products and services, many of which belonged to companies that were gobbled up like minnows, leaving the current internet advertising climate looking reminiscent of a sparsely populated aquarium with one hugely oversized moray eel sheepishly squatting on your homepage. Its effectiveness is testament to both the abundance of data and the sophisticated analytics metered upon them.

Google, mainly due to its unique positioning as internet gatekeeper and shrewd business practice, is a worldwide leader in data management and analytics. Its spin-off products, such as self driving cars and augmented reality glasses, are reliant on the company’s data wizardry. But Google’s ubiquitousness has many of us nervously mumbling about how things are beginning to look like our favourite 80s dystopian cyberpunk movie. Although, let’s be honest, that one time you tried out Bing it was rubbish, so it’s not like you’re in any rush to change your default search engine. The perception of an all-seeing, all-knowing (at least when it comes to your online habits) corporation or government agency, combined with a general sense of apathy when it comes to doing anything about it, has fostered a negative public opinion of modern data collection and usage.

Facebook's Prineville data center covers a sprawling 150,000 square feet, and is projected to double in size to 300,000 square feet -- big enough to house five American football fields. Photo: Intel Free Press

Facebook’s Prineville data center covers a sprawling 150,000 square feet, and is projected to double in size to 300,000 square feet — big enough to house five American football fields. Photo: Intel Free Press

But what entities such as Google and GCHQ do with data is merely part of what bubbles up to the surface of public perception when it comes to the data revolution. Our modern digital capabilities have had a huge impact over virtually every discipline of research and development. In many cases, the difference between how we looked at the world 20-30 years ago and how we look at the world today is similar to trying to get a good idea about the jungle by staring at one tree compared with taking a helicopter ride over the Amazon basin. We’ve gone from having to craft exquisite theories based on axiomatic knowledge and a few choice examples to trying to sift out gems of truth from a mess of gravelly evidence. During my tenure as a research seismologist, I had a veteran scientist lament to me that we no longer examine every little wiggle of a single earthquake seismogram to glean knowledge about its processes. Instead, modern analysis often involves models using hundreds or thousands of earthquakes, where computers deal with the nitty gritty so that we can see the bigger picture. A similar evolution can be seen in other areas; 2015 Nobel Prize for Economics winner, Angus Deaton, recently wrote in the Royal Economic Society Newsletter that “the typical [economic] thesis of the eighties was an elaborate piece of price theory estimated by non-linear maximum likelihood on a very small number of observations, [and] the typical thesis of today uses little or no theory, much simpler econometrics, and hundreds of thousands of observations.

While the data revolution has allowed us to model and understand our world more intimately, I feel there is a mild nostalgia for the way things were done before. It seems like there is no space in the modern world for deep thinkers; the equivalent advances in physics made by Isaac Newton, a few reams of parchment, an apple tree and chin stroking contemplation now require a team of hundreds, miles of underground tunnels, international collaboration and a lifetime supply of lab coats. This attitude undermines the true innovative genius that this new data driven way of thinking has cultivated. The reality, in my opinion, is that there was an initial decoupling between what can be achieved with the data that is ever increasingly available to us and the general understanding of how it is done.

But this is changing; there has been a huge drive to make the magic of data accessible to all. There is currently a perfect storm of circumstances that are making data analytics a key part of business and society: Computational processing has never been cheaper and more powerful, meaning large datasets can be manipulated by easily obtainable equipment. Our collective computer-based knowledge has sky-rocketed as we become more and more immersed in technology (to the point where it’s almost considered cool to be a nerd). The tools at our disposal are more innovative and easier to use than ever before – no more stumbling around in assembly or C; alongside the much maligned but ever evolving Excel, we now have languages such as R on which user friendly development environments and software packages such as RStudio and Tableau are built, and Python with bespoke data handling and visualisation packages such as pandas and matplotlib.

I envision a future where our lives are enriched by the data available to us. With a better knowledge of how we do things in life and in business, we can become more efficient with our time and energy, and allow a deeper understanding of the way things work. It’s up to us to put all this information to work.

(Main header image: David Precious, Flickr Creative Commons)

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *