Understanding Data Science: Recent Studies
- by Joe Lamantia
If you need such a deeper understanding of data science than Drew Conway's popular venn diagram model, or Josh Wills' tongue in cheek characterization, "Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician." two relatively recent studies are worth reading.
'Analyzing the Analyzers,' an O'Reilly e-book by Harlan Harris, Sean Patrick Murphy, and Marck Vaisman, suggests four distinct types of data scientists -- effectively personas, in a design sense -- based on analysis of self-identified skills among practitioners. The scenario format dramatizes the different personas, making what could be a dry statistical readout of survey data more engaging. The survey-only nature of the data, the restriction of scope to just skills, and the suggested models of skill-profiles makes this feel like the sort of exercise that data scientists undertake as an every day task; collecting data, analyzing it using a mix of statistical techniques, and sharing the model that emerges from the data mining exercise. That's not an indictment, simply an observation about the consistent feel of the effort as a product of data scientists, about data science.
And the paper 'Enterprise Data Analysis and Visualization: An Interview Study' by researchers Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffery Heer considers data science within the larger context of industrial data analysis, examining analytical workflows, skills, and the challenges common to enterprise analysis efforts, and identifying three archetypes of data scientist. As an interview-based study, the data the researchers collected is richer, and there's correspondingly greater depth in the synthesis. The scope of the study included a broader set of roles than data scientist (enterprise analysts) and involved questions of workflow and organizational context for analytical efforts in general. I'd suggest this is useful as a primer on analytical work and workers in enterprise settings for those who need a baseline understanding; it also offers some genuinely interesting nuggets for those already familiar with discovery work.
We've undertaken a considerable amount of research into discovery, analytical work/ers, and data science over the past three years -- part of our programmatic approach to laying a foundation for product strategy and highlighting innovation opportunities -- and both studies complement and confirm much of the direct research into data science that we conducted. There were a few important differences in our findings, which I'll share and discuss in upcoming posts.