3 Basics II: Data Manipulation & Exloratory Analysis

3.1 Goals

  • Getting to know the basics of working with data: manipulating data, basic techniques of exploratory analysis

3.2 Software

3.3 Class

NB: Original worksheets prepared by Lincoln Mullen, GMU (https://dh-r.lincolnmullen.com/worksheets.html)

3.3.1 Topics

  • Selecting columns (select())
  • Filtering rows (filter())
  • Creating new columns (mutate())
  • Sorting columns (arrange())
  • Split-apply-combine (group_by())
  • Summarizing or aggregating data (summarize())
  • Data joining with two table verbs (left_join() et al.)
  • Data reshaping (spread() and gather())

3.4 Reference materials

Consult relevant chapters from:

  • Healy, Kieran Data Visualization: A Practical Guide. Princeton University Press, 2018. ISBN: 978-0691181622. http://socviz.co/
  • Hadley Wickham & Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly, 2017. ISBN: 978-1491910399. https://r4ds.had.co.nz/
  • Wickham, Hadley. Advanced R, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC, 2019. http://adv-r.had.co.nz/

3.5 Homework

  • Finish your worksheet and submit your HW as described below.
  • Additional: if you’d like more practice, you can use swirl library:
    • To install: install.packages("swirl")
    • To run: library(swirl)
      • Then: swirl()
      • it will offer you a set of interactive exercises similar to DataCamp.

3.6 Submitting homework

  • Homework assignment must be submitted by the beginning of the next class;
  • Email your homework to the instructor as attachments.
    • In the subject of your email, please, add the following: 57528-LXX-HW-YourLastName-YourMatriculationNumber, where LXX is the number of the lesson for which you submit homework; YourLastName is your last name; and YourMatriculationNumber is your matriculation number.