Linguistic Data: Quantitative Analysis and Visualisation: computational linguistics
- Instructors: Ilya Schurov and Olga Lyashevskaya
|Jan 18||Introduction. Quantitative linguistic research and data types. R basics||Intro Slides Lab 01: intro to R|
|Jan 25||Hypothesis testing. Binomial test. R: dataframes, tydyverse||Lab 02 tidyverse cheat sheet|
|Feb 1||Central limit theorem. Variance. Student's t-test. R: simulating data, boxplots, density plots, binomial test, t-test|
|Feb 8||Two-sample t-test. Paired t-test. Confidence intervals.||Lab 04: Rmd pdfCI slides CI demo|
|Feb 15||ANOVA. Correlations||Lab 05: Rmd pdf|
|Feb 22||Tests for categorial data. Chi-squared test. Fisher exact test. Effect size||Lab 06: Rmd pdf DataCamp: contingency tables|
|Feb 29||Linear regression. Multivariate linear regression. Dummy variables|
|Mar 7||Fixed and random effects. Linear mixed-effects models|
|Mar 21||Logistic regression. Model selection||Lab 09 .Rmd html|
|Apr 11||Dimensionality reduction. PCA. MDS. t-SNE||Lab 10 .Rmd template .Rmd code|
|Apr 13||Correspondence analysis: CA, MCA||Lab 11 Rmd pdf|
|Apr 27||Decision trees. Decision forests||video Lab 12 Rmd template solution Rmd|
|May 16||Cluster Analysis||video, Lab 12 Rmd More on aestetics Supplementary material 1 More on cluster evaluation gospels dataset|
|May 25||Probabilistic models I: maximum likelihood estimates||video|
|June 1||Probabilistic models II: Bayesian models||video. Nice tutorial on |
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
It is possible avoid installing anything on your PC, using rstudio.cloud (an online version of RStudio).
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
- Homework 1 (deadline: February 16, 23:59), Chapters 1, 2, 3, and 5 of the DataCamp course "Introduction to R". Please fill in this form.
- Homework 2 (deadline: February 23, 23:59), Chapters 4 and 6 of the DataCamp course "Introduction to R".
- Homework 3 (deadline: February 9, 12:00), Hypothesis testing, binomial test, t-test. HW3 pdf html Rmd template
- Homework 4 (deadline: February 29, 12:00), T-test and ANOVA, reproducing some results from Leivada & Westergaard 2019 HW4 pdf html Rmd template link to submit your .Rmd file
- Homework 5 (deadline: March 09, 23:59), Contingency tables and tests, linear models HW5 pdf html Rmd template link to submit your .Rmd file
- Homework 6 (due: March 28, 12:10), Mixed-effect models HW6 pdf html Rmd template link to submit your .Rmd file]
- Projects description link
- Projects pre-registration: Due April 27, 2020. Please create the folder Project in your GitHub repository and put a pdf file there. Optionally, you can add a csv file with the preliminary version of your dataset and an Rmd file, if needed.
- Final versions of project papers: submit here. Note that deadlines changed (see telegram chat for details).
- Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. HSE library link
- Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. HSE library link
- Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. pdf
- Gries, Stefan (2017). Quantitative Corpus Linguistics with R : A Practical Introduction (Vol. Second edition). Milton Park, Abingdon, Oxon: Routledge. eBook
- Empirical Bayes
- Harney, H. L. (2016). Bayesian Inference : Data Evaluation and Decisions (Vol. 2nd ed). Springer. eBook
- McElreath, R. (2016). Statistical Rethinking : A Bayesian Course with Examples in R and Stan. eBook
- Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. Springer. eBook
- R markdown [https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf Rmd Cheat Sheet
This page contains the materials of the course "Linguistic Data: Quantitative Analysis and Visualisation", taught at the HSE Master's program "Computational Linguistics" in 2019-2020 academic year. Modules: 3-4.
To take part in the oral exam, please fill in the form. You will be assigned a particular time slot to present your project. The students have to
(1) submit the materials for the project (see the final project description in the course program) no later than 23:59 the day before the oral exam uning the link;
(2) enable Zoom, microphone, and screen sharing working on their devices;
(3) join Zoom Meeting using the link that can be found in the official telegram chat.
Your answer (20 minutes) will consist of two parts, presentation of the project (up to 10 minutes) and discussion. Please screen share and comment your materials while presenting the project. We also suggest you to provide a link to a pdf or html versions of the project for the comfort of other participants. The course instructors will lead the discussion, however, other students can also ask questions regarding your project. Important: If you experience technical difficulties during the exam for more than 5 minutes (unstable internet connection, corrupted sound, etc), make screenshots to document those difficulties and communicate with your instructors to make alternative arrangements. The exam scores will be announced individually the same day after the end of the exam. You can ask for feedback about your performance on exam immediately after you get results. Note also how to appeal section of the HSE rules .