Mapfry Team
upon
Jan 7, 2025
Why Big Data Isn't a Big Deal
For every complex problem, there is always a simple, elegant and completely wrong solution.

Henry Louis Mencken's famous quote leave no doubts, difficult problems will not be solved in three steps.

The modern world is a sequence of complex invisible gears.

You can sell something and charge an amount for it, which can be paid on a credit card, which in turn will take a while to pay you and you will still have a commission.

You can do someone a favor and expect return in the future, an even more complex metric.

These are two small examples from everyday life and how the network of relationships is intricate and invisible.

However, we do not stop trying to understand them, to anticipate their movements and trends.

We do this by studying the dynamics of regions and pointing out the best places for this or that.

However, there are different approaches to understanding the dynamics of regions.

For some, the solution lies in creating a model of the world that is so complete that it is almost the size of the world itself.

This is the case of solutions that are as complex as or more complex than the problems that are proposed to be solved.

This line tends to endlessly complicate models, always adding information and conditions.

Complex solutions for complex problems

We came from an Industrial Age that produced so many excesses that we were frightened by the cemeteries of cars, planes, factories, and even ghost towns.

Today we live in an era called the Information Age, which, even recently, is already producing its excesses.

So much information available led to a phenomenon full of expectations, Big Data.

With Big Data, it would be possible to process tons of data and extract patterns from them to reveal the invisible gears of modern life.

Huge databases called Data Lakes they came to be seen as the main store of value for companies.

Data is the new oil

Data is the new oil, the bachelors used to say.

But that's not quite what happened.

We found that the data itself is worth little, but it must have some value.

It turns out that the expectation of extracting value from large volumes of data, just as we extracted oil from ancient geological layers, did not materialize.

They formed the Data Cemeteries of useless data.

The power of information is not tied to volume, but to its ability to add context to an analysis.

Only information that helps explain a phenomenon will have the value that understanding the phenomenon itself has.

Did it get complicated?

Think of it this way, the information about buying a water cup can't cost more than the water glass itself.

There is a statistical technique that seeks to identify in a database those information that really make the difference, it is called Principal Component Analysis.

This analysis discards all information that doesn't contribute to the explanation, that doesn't add context.

That's when we discovered that Big Data isn't that big.

A large proportion of such Data Lake It is formed by foam, data that are repeated in meaning.

The supposed complex solution is actually a simple and wrong solution.

You shouldn't judge the power of the model just by the number of parameters it contains
You should not judge the power of a model solely by the number of parameters it has

Andrej Karpathy, former director of Artificial Intelligence at Tesla and currently at OpenAI at ChatGPT.

Between more or less information, stay with those who present you with the sets that represent reality and distrust those who claim to have millions or billions of information points.

This is a typical Geomarketing problem that we decided to face head-on, recognizing its limitations, without adding allegorical complexities.

Having accepted this reality, we were able to move on to the dimension of power that we actually have, the narrative.

Information for the sake of information has no value in itself, its value emerges as a representation of reality and we, human beings are masters at contextualizing information through stories.

That's how we chose the solution paths that facilitate data interpretation and the sharing of insights.