Linguistic Data: Quantitative Analysis and Visualisation: computational linguistics

Материал из MathINFO
Перейти к навигации Перейти к поиску
  • Instructors: Olga Lyashevskaya and Ivan Pozdnyakov
  • Assistant: Lidia Ostyakova
  • HSE Course [syllabus: Link * Group in Telegram


Data Topics Links video
Jan 11 Introduction to R. R and R Studio. R basic: functions, variables, types html practice
Jan 18 Data analysis in linguistics. Research design. Types of variables pdf data to practice with read Gries Chapter 1.3
Jan 21 R: vectors, implicit and explicit coercion, recycling rule, missing values html video
Feb 4 R: matrices, arrays, lists, data.frames. Packages. Data import and export html [video link]

Assignment #1

Research hypothesis: formulate your pilot research hypothesis. Fill in the form Due date: 2021-01-24 23:00 MSK.

Online course assignments

Complete the following chapters on Coursera [1] course:

  • Week 1
  • Week 2
  • Week 3
  • Week 4

Final project

The project description and a link to some examples can be found here. Important dates:

  • January 24: research hypothesis
  • March 17: dataset description in Rmd, toy dataset (min. 20 observations)
  • April 14: draft dataset
  • June 10: final version of your dataset, draft final paper
  • 24 hours before the exam starts: paper submission

Course Policy

Score policy:
 The Final Score is obtained from the following formula: Final Score = 0.6 × (Homework Score) + 0.4 × (Exam Score). The student is expected to prepare the final project in a written form as electronic document. The exam is conducted in the form of oral defense of the final project. The Exam Score measures the overall quality of the final project. It is integer number from 0 to 10. Parts of the final exam data should be prepared in advance and can be used in regular homework assignments.

Academic ethics policy: you have to do your homeworks by yourself. In case of academic cheating (e.g. if you copy someone else's work, etc.), your work will receive grade 0 and the program supervisor will be notified. If you feel that you are stuck with the homework, ask instructors for advice and hints.

Late penalties: in case of late submission, your grade will be multiplied by exp(-t / 86400), where t is the number of seconds since the due date. For example, if you delay the submission by one day, your grade will be multiplied by exp(-1)=0.3678794412.

Extensions: you can ask for up to two extensions of homework due dates during the course. Each extension is one week. Extensions due to valid excuses (i.e. illness) do not count.


During this course we will use R as a programming language and RStudio as a GUI.

How to install R and RStudio?

1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.

2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.

It is possible avoid installing anything on your PC, using (an online version of RStudio).

For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).

Online course

Some parts of MOOC (online) course is included in the program.


  • Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. HSE library link
  • Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. HSE library link
  • Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. link