DH Methods for Middle Eastern Studies: Text Mining with R
Syllabus
Course Details
0.1
Aims, Contents and Method of the Course
0.2
Course Evaluation
0.2.1
DataCamp
0.3
Class Participation
0.4
Homework
0.5
Final Project
0.6
Practice Worksheets (R Notebooks)
0.7
Additional Study Materials
0.8
Software, Tools, & Technologies:
0.9
Submitting Homework:
0.9.1
Handouts / Workbooks
0.9.2
DataCamp Assignments
0.10
Schedule
0.11
Lesson Topics (
subject to modifications
)
1
General Introduction
1.1
Goals
1.2
Software
1.3
Class
1.3.1
Installing
rmarkdown
1.4
Starting with our first workbook:
1.5
Topics covered
1.6
Reference materials
1.7
Homework
1.8
Common issues with homework
1.8.1
Tracing errors
1.8.2
Comments / Commenting out
1.8.3
Random errors:
1.9
Submitting homework
2
Basics I: Main data structures in R
2.1
Goals
2.2
Software
2.3
Class
2.3.1
Topics: Data Structures & Types
2.3.2
Additional notes
2.4
Reference materials
2.5
Homework
2.6
Submitting homework
3
Basics II: Data Manipulation & Exloratory Analysis
3.1
Goals
3.2
Software
3.3
Class
3.3.1
Topics
3.4
Reference materials
3.5
Homework
3.6
Submitting homework
4
Basics III: Data Visualization; Functions
4.1
Goals
4.2
Software
4.3
Class
4.4
Topics
4.5
Reference materials
4.6
Homework
4.7
Submitting homework
5
Data I: Collecting, Organizing, Creating
5.1
Goals
5.2
Software
5.3
In Class I:
Theoretical and Conceptual
5.3.1
Ways of obtaining data
5.3.2
Main format
5.3.3
Basic principles of organizing data:
Tidy Data
5.4
In Class II:
Practical
5.4.1
Morris Dataset:
the East Vs. the West
5.5
OCR in R
5.6
Reference Materials:
5.6.1
Additional
5.6.2
Additional Readings
5.7
Homework
5.8
Submitting homework
6
Data II: Modeling & Manipulating
6.1
Goals:
6.2
Software:
6.3
In Class I:
Theoretical and Conceptual
6.3.1
Ways of modeling data: Categorization
6.3.2
Normalization
6.3.3
Note:
Proxies
,
Features
,
Abstractions
6.4
In Class II:
Practical
6.4.1
SECTION I.
6.4.2
SECTION II
6.5
Reference Materials
6.5.1
Additional Readings
6.6
Homework
6.7
Submitting homework
7
Text Analysis I: Basics
7.1
Goals
7.2
Preliminaries
7.2.1
Data
7.2.2
Libraries
7.2.3
Functions in
R
(a refresher)
7.3
Texts and Text Analysis
7.4
Word Frequencies and Word Clouds
7.4.1
Word Frequencies
7.4.2
Wordclouds
7.5
Word Distribution Plots
7.5.1
Simple — a Star Wars Example
7.6
Word Distribution Plots: With Frequencies Over Time
7.7
KWIC: Keywords-in-Context
7.8
Homework
7.9
Submitting homework
8
Text Analysis II: Distances, Keywords, Summarization
8.1
Goals
8.2
Preliminaries
8.2.1
Data
8.3
Document similarity/distance measures:
text2vec
library
8.3.1
Distance Measures: Jaccard index, Cosine similarity, Euclidean distance
8.3.2
Now, let’s run this on “Dispatch”
8.4
TF-IDF
8.4.1
Inaugural speeches of the US presidents
8.5
Text summarization
8.6
Homework
8.7
Submitting homework
9
Text Analysis III: Finding Groups of Texts
9.1
Goals
9.2
Preliminaries
9.2.1
Libraries
9.2.2
TF-IDF
9.3
Hierarchical clustering
9.3.1
PCA viz for HCLUST
9.3.2
Determining the optimal number of clusters: “Elbow Method” and “Average Silhouette Method”
9.4
K-means clustering
9.4.1
Determining the optimal number of clusters: “Elbow Method” and “Average Silhouette Method”
9.5
Other “clustering” methods
9.6
Topic Modeling
9.6.1
Topics?
9.6.2
Getting to code
9.6.3
Per-topic-per-word probabilities (
beta
)
9.6.4
Topics over time
9.6.5
Exploring topics
9.7
Addendum: different distances code sample
9.8
Homework
9.9
Submitting homework
9.10
Additional Materials
References
Published with bookdown
DH in AAS - TA with R (2022S)
References