3 Basics II: Data Manipulation & Exloratory Analysis
3.1 Goals
- Getting to know the basics of working with data: manipulating data, basic techniques of exploratory analysis
3.2 Software
- the same, with some new libraries:
3.3 Class
Practice worksheets:
NB: Original worksheets prepared by Lincoln Mullen, GMU (https://dh-r.lincolnmullen.com/worksheets.html)
3.3.1 Topics
- Selecting columns (
select()
) - Filtering rows (
filter()
) - Creating new columns (
mutate()
) - Sorting columns (
arrange()
) - Split-apply-combine (
group_by()
) - Summarizing or aggregating data (
summarize()
) - Data joining with two table verbs (
left_join()
et al.) - Data reshaping (
spread()
andgather()
)
3.4 Reference materials
Consult relevant chapters from:
- Healy, Kieran Data Visualization: A Practical Guide. Princeton University Press, 2018. ISBN: 978-0691181622. http://socviz.co/
- Hadley Wickham & Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly, 2017. ISBN: 978-1491910399. https://r4ds.had.co.nz/
- Wickham, Hadley. Advanced R, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC, 2019. http://adv-r.had.co.nz/
3.5 Homework
- Finish your worksheet and submit your HW as described below.
- Additional: if you’d like more practice, you can use
swirl
library:- To install:
install.packages("swirl")
- To run:
library(swirl)
- Then:
swirl()
- it will offer you a set of interactive exercises similar to DataCamp.
- Then:
- To install:
3.6 Submitting homework
- Homework assignment must be submitted by the beginning of the next class;
- Email your homework to the instructor as attachments.
- In the subject of your email, please, add the following:
57528-LXX-HW-YourLastName-YourMatriculationNumber
, whereLXX
is the number of the lesson for which you submit homework;YourLastName
is your last name; andYourMatriculationNumber
is your matriculation number.
- In the subject of your email, please, add the following: