Linguistic Data: Quantitative Analysis and Visualisation for computer linguists
Here will be published the materials of the course "Linguistic Data: Quantitative Analysis and Visualisation", taught at the Master programme "Computational Linguistics" in 2018-2019 academic year.
- Instructors: Olga Lyashevskaya, George Moroz, Alla Tambovtseva and Ilya Schurov.
- Modules: 3-4
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
It is possible avoid installing anything on your PC, using online version of RStudio.
How to use RStudio?
Read the instruction here.
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
|Date||Topic of the lecture||Seminar||Optional|
|12.01||Something about data: population vs sample, descriptive statistics||problems1 R-basics||RMarkdown: official page, cheatsheet|
|19.01||Population and samples. Working with data in R||artists.txt orientation.csv||More on basic graphs in R|
|26.01||Hypothesis testing||Binomial-test poetry.csv orientation.csv||RNC frequency list|
|02.02||Student's t-test. Central limit theorem: recall||T-test dplyr-ggplot icelandic.csv||asp-paper (Coretta, 2017)|
|09.02||Confidence Interval. ANOVA||Conf-intervalspoetry.csv icelandic.csv||an interactive visualization of CI by K.Magnusson
more on overlapping CI's (by A.Knezevic)
|16.02||Data manipulation with tidyverse. Visualisation with ggplot2||class materials|
|02.03||Chi-squared and Fisher's exact tests||Chi-squared-test socling.csv elision.csv|
|16.03||Correlation coefficients and a simple linear regression||Corr-regression education.csv chekhov.csv||guess correlation game correlograms|
|23.03||Multiple linear regression|
|06.04||Logistic regression||Lab10||more on visualising coefficients, more tests|
|16.04||Linear mixed-effect models||Lab11 Lab11-solutions||LME models cheat sheet|
|27.04||Nested effects. Decision trees and random forest.||nested effects Lab 12. Trees and forests Lab12-solutions|
|18.05||Dimension reduction. PCA, CA, MCA||Lab 13. PCA and MCA||3D example|
|25.05||Cluster analysis||[cluster-analysis] gospels.csv||more dendrograms more CA CA quality|
|01.06||Bayesian statistics||Lab 15|
R seminars in pdf
R seminars in .R and .Rmd
- Homework 1 (deadline: 27 January, 23:59), link to submit
- Homework 2 (deadline: 03 February, 23:59)
- Homework 3 (deadline: 10 February, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 4 (deadline: 22 February, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 5 (deadline: 3 March, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 6 (deadline: 15 May, 23:59), Rmd-file to fill in, link to submit your .Rmd file