Linguistic Data: Quantitative Analysis and Visualisation for theoretical linguists
Содержание
Course info
Dear students,
Here will be published the materials of the course "Linguistic Data: Quantitative Analysis and Visualisation", taught at the Master programme "Linguistic Theory and Language Description" in 2018-2019 academic year.
- Instructors: Olga Lyashevskaya, George Moroz, Alla Tambovtseva and Ilya Schurov.
- Modules: 3-4
Software
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
It is possible avoid installing anything on your PC, using online version of RStudio.
How to use RStudio?
Read the instruction here.
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
Materials
Date | Topic of the lecture | Seminar | Optional |
---|---|---|---|
12.01 | Something about data: population vs sample Descriptive statistics |
problems1 R-basics |
RMarkdown: official page, cheatsheet |
19.01 | Population and samples. Working with data in R |
problems2 R-samples artists.txt R-vectors R-dataframes orientation.csv |
more on basic graphs in R |
26.01 | Statistical hypotheses testing | Binomial-test poetry.csv | |
02.02 | Student's t-test. Central limit theorem |
T-test icelandic.csv |
asp-paper (Coretta, 2017) |
09.02 | Confidence Intervals | Conf-intervals poetry.csv icelandic.csv |
an interactive visualization of CI by K.Magnusson more on overlapping CI's (by A.Knezevic) |
16.02 | Data manipulation with tidyverse. Visualisation with ggplot2 |
class materials |
|
02.03 | Chi-squared and Fisher's exact tests |
Chi-squared-test elision.csv socling.csv |
|
16.03 | Correlation coefficients and simple linear regression |
Corr-regressioneducation.csvchekhov.csv |
guess correlation game |
23.03 | Multiple comparisons. ANOVA | Anova icelandic.csv |
correlograms spurious correlations |
06.04 | Multiple linear regression |
Multiple-regression english.csv |
more on visualising coefficients, more tests |
13.04 | Logistic regression | Lab10 [Lab10-solutions] |
more on visualising coefficients |
27.04 | More on model diagnostics. Mixed-effects models | Mixed-effects ReductionRussian.txt |
LME in R |
18.05 | Decision trees and random forest. | Lab 12. Trees and forests Code | |
25.05 | PCA |
class materials |
|
01.06 | Clustering | swadesh.csv | |
08.06 | NeighborNet. Simulation statistics | prefixes.txt R code scores2.csv |
R seminars in pdf
12 January: R-basics, 19 January: R-vectors, R-dataframes, R-samples, 26 January: Binomial-test
2 February: T-test, 9 February: Conf-intervals
02 March: Chi-squared-test, 16 March: Corr-regression, 23 March: Anova
6 April: Multiple-regression, 27 April: Mixed-effects
R seminars in .R and .Rmd
12 January: R-basics.R, R-basics.Rmd, 19 January: R-vectors.R, R-vectors.Rmd R-dataframes.R, R-dataframes.Rmd, R-samples.Rmd, 26 January: Binomial-test.Rmd
2 February: T-test.R, T-test.Rmd, 9 February: Conf-intervals.Rmd Conf-intervals.R
2 March: Chi-squared-test.Rmd, Chi-squared-test.R, 16 March: Corr-regression.R, Corr-regression.Rmd, 23 March: Anova.R, Anova.Rmd
6 April: [Multiple-regression.R], [Multiple-regression.Rmd], 27 April: Mixed-models.R, Mixed-models.Rmd
Homeworks
- Homework 1 (deadline: 27 January, 23:59), link to submit
- Homework 2 (deadline: 03 February, 23:59)
- Homework 3 (deadline: 10 February, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 4 (deadline: 19 February, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 5 (deadline: 3 March, 23:59), Rmd-file to fill in, link to submit your .Rmd file
- Homework 6 (deadline: 15 May, 23:59), Rmd-file to fill in, link to submit your .Rmd file
Final project
- Project topics: link to the table to fill in
- Projects pre-registration (deadline: 28 April, 23:59): link to submit your file
- Final versions of projects: link to sumbit your files