
In this video, Open University academic, Tony Hirst talks about managing and analysing data, following the “4 Steps of Data Wrangling”: Clean, Shape, Augment, and Look. As with the other two videos I’ve reviewed, I’ve followed the spirit of the ‘revise/remix’ ethos of the course and have edited out the glitches (and enlarged the slides).
In summary, Tony provides a brief overview of following data wrangling tools:
- Open Refine for cleaning
- Manyeyes for generating treemaps
- Gephi for generating network graphs
- R and RStudio for statistical analysis and generating charts
- IPython Notebook for writing and executing code for analysing datasets
He demonstrates Pivot tables and Sankey diagrams, and suggests looking for, outliers, similarities and differences, and trends, when exploring data for visualisation.
Tony also quotes John Tukey’s statement from half a century ago, that computers would allow people to become “journeymen carpenter’s of data analytics” and quotes Leland Wilkinsons’ to support the use of powerful tools to make sense of data and develop data narratives.
References:
- Tukey, J. W. (1965). The technical tools of statistics. The American Statistician, 19(2), 23-28.
- Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23-25.
- Wilkinson, S. (1999). Computing – The Grammar of Graphics. New York: Springer-Verlag, Inc.
See more of Tony’s thinking at blog.ouseful.info and at github.com.