Linguistic Data: Quantitative Analysis and Visualisation: computational linguistics
- Instructors: Olga Lyashevskaya and Ivan Pozdnyakov
- Assistant: Lidia Ostyakova
- HSE Course [syllabus: Link * Group in Telegram
|Jan 11||Introduction to R. R and R Studio. R basic: functions, variables, types||html||practice|
|Jan 18||Data analysis in linguistics. Research design. Types of variables||data to practice with read Gries Chapter 1.3|
|Jan 21||R: vectors, implicit and explicit coercion, recycling rule, missing values||html||video|
|Feb 4||R: matrices, arrays, lists, data.frames. Packages. Data import and export||html||[video link]|
|Feb 18||R: Conditions and loops. Functions. Apply family.||html html||[video link]|
|Mar 4||R: Tidyverse||html||video|
|Mar 18||R: Advanced Tidyverse||html|
|Apr 6||Visualizations: ggplot2||html Rmd||[video link]|
|Apr 15||Introduction to inferential statistics. Central limit theorem. Null hypothesis significance testing. P-value. Confidence interval||html|
|Apr 20||Tests of difference for two-sample designs: T-test, independent and paired. Nonparametric equivalents: Mann-Whitney and Wilcoxon tests. Tests for nominal data: chi-squared test, Fischer exact test. Effect size tests for big data.||html||[video link]|
|Apr 29||Covariance and correlation. Pearson's correlation, Spearman's and Kendall non-parametric correlation tests. Multiple comparisons problem.||html||Playing with students' data zip|
|May 11||Linear regression. Multiple linear regression. Family of Generalized and General linear models||Rmd||html|
|May 18||Logistic regression. Model selection.||Rmd|
|May 20||Analysis of variance (ANOVA)||html|
|May 25||Mixed-effect linear models.||Rmd||html|
|Jun 3||Dimensionality reduction. PCA. CA. MCA||Rmd Rmd template||hint: how to (not) interpret CA see also MDS|
|Jun 8||CART: decision trees and random forest. Cluster analysis||Rmd: CART Rmd:Clustering|
|Jun 10||Bayesian statistics|
Research hypothesis: formulate your pilot research hypothesis. Fill in the form Due date: 2021-01-24 23:00 MSK.
Data description. Create a repository on GitHub and put the link in the same table form Due date: 2021-03-17 23:59 MSK.
Homework: Advanced Tidyverse
Homework: From t-tests to data modeling
Online course assignments
Complete the following chapters on Coursera  course:
- Week 1
- Week 2
- Week 3
- Week 4
The project description and a link to some examples can be found here. Important dates:
- January 24: research hypothesis
- March 17: dataset description in Rmd, toy dataset (min. 20 observations)
- April 14: draft dataset
- June 10: final version of your dataset, draft final paper
- 24 hours before the exam starts: paper submission
Score policy: The Final Score is obtained from the following formula: Final Score = 0.6 × (Homework Score) + 0.4 × (Exam Score). The student is expected to prepare the final project in a written form as electronic document. The exam is conducted in the form of oral defense of the final project. The Exam Score measures the overall quality of the final project. It is integer number from 0 to 10. Parts of the final exam data should be prepared in advance and can be used in regular homework assignments.
Academic ethics policy: you have to do your homeworks by yourself. In case of academic cheating (e.g. if you copy someone else's work, etc.), your work will receive grade 0 and the program supervisor will be notified. If you feel that you are stuck with the homework, ask instructors for advice and hints.
Late penalties: in case of late submission, your grade will be multiplied by exp(-t / 86400), where t is the number of seconds since the due date. For example, if you delay the submission by one day, your grade will be multiplied by exp(-1)=0.3678794412.
Extensions: you can ask for up to two extensions of homework due dates during the course. Each extension is one week. Extensions due to valid excuses (i.e. illness) do not count.
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
It is possible avoid installing anything on your PC, using rstudio.cloud (an online version of RStudio).
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
Some parts of MOOC (online) course is included in the program.
- Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. HSE library link
- Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. HSE library link
- Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. link