How Your Digital Footprint Generates Big Data

Privacy Issues with Data

How much data are you generating?

Wondering how much the internet knows about you right now? Do Not Track is an online interactive documentary series about internet privacy. The series combines short videos and interactive elements to educate people about who may be tracking them online and the amount of private information that may be extrapolated from their Internet activities.

Okay, that’s cool. The internet knows where you are and what browser you are on. But what if it could derive meaning out of your digital footprint as well? University of Cambridge developed a personalisation engine that fairly accurately predicts psychological traits from your Facebook behaviour.

Crazy stuff right? In the digital age, everything is trackable and measurable, generating immense amounts of data from each person. So let's break down what this means.

Access to Information = Currency of Power

Here's a few industry examples of how data is being created and utilized.

Technology Industry: The industry that thrives on data generation and analysis. With any social media platform and software service, you are paying with your data. Even programs that you are paying for are probably collecting your data and reselling it. (Refer back to that terms and conditions page that you never read)

Finance Industry: The finance industry have widely adopted big data analytics to inform better investment decisions with consistent returns. Utilizing big data, data scientists and software engineers have developed algorithmic trading to analyze vast historical data with complex mathematical models to maximize portfolio returns.

Advertising Industry: The digital advertising industry is thriving on the copious amount of personal data online. This makes user and behaviour targeting easier than ever. This is also the industry that’s paying for all your free online tools and platforms.

Big Data Storage

So what does "big data" even mean?

Becoming much more like a buzzword in recent years, the term "big data" simply refers to the process of using data for predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data.

The exponential rate of data growth.

By 2020:

At the moment less than 0.5% of all data is ever analyzed and used, just imagine the potential here.

The 5 Vs of Big Data

The 5 Vs of big data is often referred to as the crux of the industry. Volume, variety, veracity, velocity and value all need to be taken into consideration when examining data.

The Issues of Big Data

1. Correlation vs. Causation

"Correlation does not imply causation" is a phrase used in statistics to emphasize that a correlation between two variables does not imply that one causes the other. Big data is great at statistical tests to calculate correlation between variables, but not causation. There can be correlations between 100 variables but the struggle is putting them all together to see the bigger picture.

2. Insight Quality and Meaning

Find correlation is great. But the quality and meaning of the insight may vary. For example, the divorce rate in Maine correlates with the per capital consumption of margarine. But does this mean anything? Probably not. Here is an entire page dedicated to spurious correlations that may make you sound highly intelligent.

3. User Privacy

Digital Control

The biggest public concern is data collection in itself, especially user data. Although we most likely all check-boxed our privacy rights away online, it is worthwhile to wonder if our data is in the right hands. There's the corporations that use our data to generate profit, and also governments that have control over societal wellbeing. And to what extent do we protect our "privacy" without sacrificing all the free tools and online products we use? 

Want to stay informed on the emerging topics in the digital age? Subscribe below!