In his book How to Lie with Statistics Darrel Huff had themes that included “Correlation does not imply causation” and “Using random sampling“. It also shows how statistical graphs can be used to distort reality, for example by truncating the bottom of a line or bar chart, so that differences seem larger than they are, or by representing one-dimensional quantities on a pictogram by two- or three-dimensional objects to compare their sizes, so that the reader forgets that the images do not scale the same way the quantities do.
Of late I have taken a great interest in learning data science, this bug bit me some time back, but I hadn’t really answered the whole important question required in all worthwhile endeavors…WHY?? As the saying goes, “When the WHY is clear, the HOW will take care of itself”
I have become obsessed with the whole idea of data analytics, predictive models, regression etc. and using data to understand how businesses are using data to make better decisions, I got to understand how some of the biggest and most valuable companies today e.g. Google, Facebook, Apple, LinkedIn are built around data, they have mastered the art of turning data given to them by customers into products or data driven services.
Of course this is exciting to me, because in the many years working with GIS Ltd, data analytic is a great part of what we do, we approach all this from an information management perspective, our focus is on helping NGO’s use data to answer questions like; Who are our beneficiaries and where are they? “beneficiary targeting” —- which project should we implement first to achieve the greatest impact and where? “project location – prioritization”, “project – beneficiary matching” —–To what extent will this project meet the needs of our targeted beneficiary?
The ability to properly answer these questions is something NGO’s & CBO’s continue to struggle with, it’s a common sight in many former “NGO hot zones” to go and find many communities still struggling with poverty, poor sanitation etc., one may ask, is this a question of causation or correlation?
We’ve all heard the joke that eating pickles causes death, because everyone who dies has eaten pickles. That joke doesn’t work if you understand what correlation means.
NGO’s & Governments sit on terabytes of data that is only minimally put to use in decision making, of course this trend is changing as more pressure is put on these institutions to have a scientific based approach to responding to humanitarian challenges and extending much needed service to the population.
In one of our many capacity building events we organize for NGO’s as we advocate for a scientific approach to humanitarian response, I receive questions of various challenges that these technical people go through as they try to provide much needed support to various communities, one that stood out for me was a participant from one of the NGO’s focusing on drilling water points, who asked how they could improve their chances of striking water, as you may have guessed “effective” use of data was a major part of my recommendation.
As humanitarian challenges around the world get more complicated, NGO’s will not just be required to respond faster, but act in a more informed manner that allows beneficiaries to quickly feel the impact of their interventions. Effective data use will play a major factor in this.
The question facing every company today, every startup, every non-profit, is how to use data effectively — not just their own data, but all the data that’s available and relevant. Using data effectively requires something different from traditional statistics. Let’s not confuse statistics, M&E, with data scientist.
So who is a data scientist and who is not?
Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).
But merely using data as many professional disciplines do isn’t really what we mean by “data science.”
What differentiates data science from statistics and M&E is that, data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others. This is the most boring but also the most crucial.
In the last 5-10 years, there has been an explosion in the amount of data that’s available. Whether we’re talking about web server logs, tweet streams, online transaction records, “citizen science,” data from sensors, government data, or some other source, the problem isn’t finding data, it’s figuring out what to do with it. And it’s not just companies using their own data, or the data contributed by their users. For even more mind numbing statistics, check out the infographics below produced by Domo
Data Science in action
During the recent earth quake catastrophe in Nepal, millions of OpenStreetMapvolunteers took online to provide much needed support in updating the map of Nepal, these maps which are data products was instrumental in making life saving decisions especially when it came to delivering much needed humanitarian relief to the affected people.
I have always wondered how LinkedIn is able to suggest people to connect with?LinkedIn uses patterns of friendship relationships to suggest other people you may know, or should know, with sometimes frightening accuracy.
Techniques like sentiment analysis also known as opinion mining can be used to determine the attitude of the speaker, voters, community opinion on new projects etc. I witnessed this first hand during an event at technology hub Outbox, where R was used to analyze twitter text messages streamed during the first live televised presidential debate in Uganda, through the hashtag #UGDebate16
The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.