If you can read this you can be a data scientist!


The opening lines of Virgil’s Aeneid. Turns out it’s a stepping stone to learning de scientia data sit amet.

As a humanities guy with strong technical and quantitative interests, I’ve watched the explosion of data science* as a component of business, education, culture and career. It is a data age.


Found a first-person account of a Classics grad student turned data scientist; interesting take in particular that there are some commonalities that are not necessary top of mind.

“It was true that I needed to know statistics and how to write code to function effectively in these roles, but that knowledge was a given. It turned out that the differentiating points between a great data scientist and an average one were in the researcher’s ability to deal with that same uncertainty that had driven me from the humanities and into quantitative research in the first place. In other words, the scientific methodologies had all the same epistemological concerns and issues as the humanities — they just tackled those problems with different tools.

My experience has lead me to believe that graduate humanities work is in fact one of the most useful backgrounds for an industry data scientist. While there’s often a lot of focus on data scientists being experts in statistics or coding, these tools are simply a means to an end — they’re necessary but insufficient for doing great data science. If you’re a humanities graduate student and are interested in data, I’d feel confident in your ability to succeed in the field based on your less technical skills. Specifically, experience as a graduate researcher in humanities makes you an expert in:

    1. Going deep into topics and teaching yourself anything
    2. Stating research questions and supporting your answers with evidence
    3. Communicating the limitations and assumptions of your approach

    In my mind, these broad research skills are more valuable (and rare) than knowledge of the specifics of any particular quantitative methodology.

*”Data scientist is just a sexed up word for statistician.’ Nate Silver

Nerd Words: Fallacies of Data Science

Good piece  by Shane Brennan on Medium about the realities of data science in day-to-day working life (in contrast with how it’s taught).

His ten fallacies:

1. The data exists.
2. The data is accessible.
3. The data is consistent.
4. The data is relevant.
5. The data is intuitively understandable.
6. The data can be processed.
7. Analyses can be easily re-executed.
8. Where we’re going we don’t need encryption.
9. Analytics outputs are easily shared and understood.
10. The answer you’re looking for is there in the first place.

I have always considered Excel primarily a medium for creative expression!

He is writing about a business context–for instance where Google Analytics, and its attendant woes, are likely to play a big role in answering a client’s marketing strategy question. But what struck me about his fallacies is their aptness in worlds I hang in–journalism and education. Data journalism is, of course, the flavor of the week, month, and year, and no doubt it is of value–but it is sometimes seen like a magic toolbox that can be used without an hypothesis, without a real data set, and, most importantly, no clear idea of what would actually constitute a newsworthy answer to the query.

I know there are data journalism efforts that don’t fall pray to Brennan’s list,  but I wonder how many. In particular, overcoming that last point in the affirmative is a high bar. Is the information really there for the finding? Reminds me of a quote from Confucius.

“The hardest thing of all is to find a black cat in a dark room, especially if there is no cat.”–Confucius

(As for education, I’ll save my gripes about use and misuse of data for another day.)

%d bloggers like this: