Linguistic Data: Quantitative Analysis and Visualisation for computer linguists — различия между версиями

Материалы по математике, 2018-19 учебный год
Перейти к: навигация, поиск
(Final project)
Строка 110: Строка 110:
* Project topics: [ link] to the table to fill in  
* Project topics: [ link] to the table to fill in  
* Projects pre-registration (deadline: 28 April, 23:59): [ link] to submit your file
* Projects pre-registration (deadline: 28 April, 23:59): [ link] to submit your file
* Final versions of projects:  [ link] to sumbit your files

Текущая версия на 20:56, 19 июня 2019

Course info

Dear students,

Here will be published the materials of the course "Linguistic Data: Quantitative Analysis and Visualisation", taught at the Master programme "Computational Linguistics" in 2018-2019 academic year.

  • Instructors: Olga Lyashevskaya, George Moroz, Alla Tambovtseva and Ilya Schurov.
  • Modules: 3-4


During this course we will use R as a programming language and RStudio as a GUI.

How to install R and RStudio?

1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.

2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.

It is possible avoid installing anything on your PC, using online version of RStudio.

How to use RStudio?

Read the instruction here.

For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).


Date Topic of the lecture Seminar Optional
12.01 Something about data: population vs sample, descriptive statistics problems1 R-basics RMarkdown: official page, cheatsheet
19.01 Population and samples. Working with data in R artists.txt orientation.csv More on basic graphs in R
26.01 Hypothesis testing Binomial-test poetry.csv orientation.csv RNC frequency list
02.02 Student's t-test. Central limit theorem: recall T-test dplyr-ggplot icelandic.csv asp-paper (Coretta, 2017)
09.02 Confidence Interval. ANOVA Conf-intervalspoetry.csv icelandic.csv


an interactive visualization of CI by K.Magnusson

more on overlapping CI's (by A.Knezevic)

16.02 Data manipulation with tidyverse. Visualisation with ggplot2 class materials
02.03 Chi-squared and Fisher's exact tests Chi-squared-test socling.csv elision.csv
16.03 Correlation coefficients and a simple linear regression Corr-regression education.csv chekhov.csv guess correlation game correlograms
23.03 Multiple linear regression
06.04 Logistic regression Lab10 more on visualising coefficients, more tests
16.04 Linear mixed-effect models Lab11 Lab11-solutions LME models cheat sheet
27.04 Nested effects. Decision trees and random forest. nested effects Lab 12. Trees and forests Lab12-solutions
18.05 Dimension reduction. PCA, CA, MCA Lab 13. PCA and MCA 3D example
25.05 Cluster analysis [cluster-analysis] gospels.csv more dendrograms more CA CA quality
01.06 Bayesian statistics Lab 15
08.06 Simulation statistics scores2.csv

R seminars in pdf

12 January: R-basics, 26 January: Binomial-test

2 February: T-test, 9 February: Conf-intervals, Anova

02 March: Chi-squared-test, 16 March: Corr-regression

R seminars in .R and .Rmd

12 January: R-basics.R, R-basics.Rmd, 26 January: Binomial-test.Rmd

2 February: T-test.Rmd, T-test.R, 9 February: Conf-intervals.Rmd, Conf-intervals.R, Anova.Rmd, Anova.R

2 March: Chi-squared--test.Rmd, Chi-squared-test.R, 16 March: Corr-regression.R, Corr-regression.Rmd


Final project

  • Projects description
  • Project topics: link to the table to fill in
  • Projects pre-registration (deadline: 28 April, 23:59): link to submit your file
  • Final versions of projects: link to sumbit your files