The Entropy of data

Habeeb Mustafa
3 min readOct 6, 2020

“You gave me a weapon, sent me to battle, the time came and I looked into the eyes of the enemy, called upon god as witness, held my breath and gently pulled the trigger . . . then nothing happened — because it was empty”

Thats what my Ex-boss accused me of doing to him after I gave him a report based on the wrong data logic and he happily walked into a senior management meeting with it.

How the rest of my day unfolded is another story, however a quick check revealed that the data was clean, the report was beautiful but the sequence of formulas and aggregations was misaligned by just a column here and there.

Moral of the story however is that having the data is only part of the solution; how we transform it, enhance it and derive a numerical conclusion is the other quarter half. The last quarter bit of the exercise is how we can do this repeatedly making necessary changes and updates in between without causing a reactor meltdown — This is also where most accidents happen and is the fundamental crisis of data that has plagued our work for years even with the higher level automation.

A problem can be solved if the problem is identified — There was a long held perception that access to data (start of the reporting process) was the problem while the alternative paradigm was to have a dynamic visualization (end of the reporting process) to answer all of life’s big mysteries. No doubt, these were impossibly time-consuming areas, however the real delicate part was the data clean up, merger with other data sets and aggregation to a point of unification where all the different parts simply made sense.

We could see that nearly everything that we wanted to do was already doable technically, but nothing was easy enough. It was always a question of pulling a technical bunny out of a magic hat yet never being able to put it back inside and helplessly watch it hop around with little control.

To understand the predicament, let us take a step back and look at the basic desires and fears of a data analyst in the organization:

We love the freedom of being able to choose the best modeling approach for our project and we want to stay in control of the inputs, logic implementation and do necessary upgrades and maintenance without any hiccups.

On the other hand, very few of us feel strongly about the nature of the data warehouse, the update sequence or dealing with obscure APIs — Preferably, from our point of view, these foundational components should “just work” so we would focus our energy solely on improving data science productivity and by being fanatically human-centric. (read manager centric — yes debatable)

So, the big question is, how can we improve the quality of life for our data analysts?

This is where one of those data integration platforms form a crucial layer. These platforms have proven their worth in a very short time by solving complex data integration and ingestion problems for an advanced analytics tools— They have the power to bring the data from all relevant corporate systems to one single platform. They have the capacity to go beyond and above what we perceive as data analysis with scalable data transformation modules, integration of external data sets, easy to access rest API and synchronization with an SQL database.

Be sure to ask for these in your organization rather than chart building tableaus or SQL writing maestros.

It is exciting time for analysts and our heads are bursting with the possibilities, still the challenge remains to wow the business side.

--

--