What is Data?

“90 percent of the world’s data was created in the last two years” was the primary theme of a Science Daily article from March of 2013.  Those of us in the Big Data trade happily used this sound bite, oftentimes coupled with the “80 percent of data is unstructured” sound bite to drive home the point that the world of data processing is in for some (capital B) Big changes in the years ahead. And we’re all happy to be part of this coming wave.

At the end of 2013 I found myself at a holiday gathering surrounded by mostly legal professionals, having been invited further into the inner circle of a lawyer friend.  Invariably the “what do you do” topic came up, and being one of only 2 or 3 non-lawyers in the group, the compare and contrast discussions of our respective professions made for interesting conversation. After a  drink or two, and settling on some common topics of understanding, the topic of Big Data and “acceleration of all things tech” came up.

Reflexively I uttered the (in?)famous 90% sound bite to my new acquaintance, subconsciously awaiting a “wow” response.

“Interesting. What exactly do you mean by data? And who measured it?”

“Ah yes he’s a lawyer” I thought to myself, along with “yeah, what the heck does this even mean and why the heck did I just say it?”

Not to be thrown off (tech folks are not strangers to serendipitous Socratic parries either) I thought for a bit, and then biding time said something to the effect of “Good point and certainly a  thought-provoking question.” Luckily when I did use the sound bite, I was careful to give ownership to not only the above article I vaguely recalled but also the ubiquitous “they” (you know, the people who say stuff ). So I was at least able to pivot this to a thought exercise between the two of us that actually made for some good conversation (or so we thought after a couple of beers).

Some of the things we came up with included:

  • Data – to be valuable – must involve a transcription/recording of information to some kind of durable media.
  • Also, for data to be useful, it must be retrievable and comprehensible from said durable media to a recipient entity (yes, that part sounds lawer-ish).
  • The recipient entity may either be the creating entity or another entity entirely (e.g. “I’m writing this down so that I can remember it later…”).
  • “Wow, I think this winter brew we’ve been drinking is 10% ABV. No wonder my head is spinning after only two….”
  • “10% ABV is a data point!”

And so on.

Along the way we talked about cave paintings, tablets (the type actually made of stone), Gutenberg, Turing, transistors, HAL, Google (and our thoughts about how it became huge), and of course Big Brother.

It was an opportunity for both of us to take a step back, quite awestruck at being witness to very recent cataclysmic changes that previously happened over many generations.  And while we didn’t go too far down a philosophical rabbit hole of data vs. information vs. knowledge vs. intelligence, it was very much lurking in the background. And it was abundantly obvious that “knowing stuff,” in a timely fashion and being able to use that knowledge appropriately is the fastest path to the top of the food chain, both literally and figuratively.  And it also was abundantly clear that our ability to store and retrieve “stuff” (regardless of its value or quality) is nothing short of mind-boggling, with more boggling to come.

As for the proper definition of data beyond a bar room discussion, well Wikipedia has a good start of defining it from an academic standpoint. And of course when data grows up and becomes Big Data a whole new set of discussion opportunities emerge, replete with the ubiquitous collection of both zealots and cynics.

But regardless of where you stand on the importance of Big Data, and the value of all of the data that is available, perhaps one thing we can all agree on is that nowadays, dealing with data requires quite a bit of help. In fact this has been true for quite some time.

A famous quote from F. Scott Fitzgerald about writing “This Side of Paradise” sums it up nicely for me – “To write it, it took three months; to conceive it three minutes; to collect the data in it all my life.” Granted he was only 23 years old at the time, a short life to that point indeed but a long time to collect data by today’s standards, perhaps even for the great American novel.

And while most of us are not writing literary classics, no one is immune from the need to make sense of the ever expanding data torrent around us, especially those of us in the data trade, for whom it’s our job.  And so, in the hope of contributing to the lessons of making sense of data, this humble blog is now re-dedicated to chronicling my own personal data discoveries and rediscoveries.

Let’s see how the next 23 years goes…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s