Good piece by Shane Brennan on Medium about the realities of data science in day-to-day working life (in contrast with how it’s taught).
His ten fallacies:
1. The data exists.
2. The data is accessible.
3. The data is consistent.
4. The data is relevant.
5. The data is intuitively understandable.
6. The data can be processed.
7. Analyses can be easily re-executed.
8. Where we’re going we don’t need encryption.
9. Analytics outputs are easily shared and understood.
10. The answer you’re looking for is there in the first place.
He is writing about a business context–for instance where Google Analytics, and its attendant woes, are likely to play a big role in answering a client’s marketing strategy question. But what struck me about his fallacies is their aptness in worlds I hang in–journalism and education. Data journalism is, of course, the flavor of the week, month, and year, and no doubt it is of value–but it is sometimes seen like a magic toolbox that can be used without an hypothesis, without a real data set, and, most importantly, no clear idea of what would actually constitute a newsworthy answer to the query.
I know there are data journalism efforts that don’t fall pray to Brennan’s list, but I wonder how many. In particular, overcoming that last point in the affirmative is a high bar. Is the information really there for the finding? Reminds me of a quote from Confucius.
“The hardest thing of all is to find a black cat in a dark room, especially if there is no cat.”–Confucius
(As for education, I’ll save my gripes about use and misuse of data for another day.)