Разница между страницами «Linguistic Data: Quantitative Analysis and Visualisation: computational linguistics» и «Linguistic Data: Quantitative Analysis and Visualisation: linguistic theory»

Материал из MathINFO
(Различия между страницами)
Перейти к навигации Перейти к поиску
 
 
Строка 8: Строка 8:
 
| Jan 18 || Introduction. Quantitative linguistic research and data types. R basics || [https://docs.google.com/presentation/d/1VUIUa3Db5n4dsD_HeA3e-mz55zK8uPrko3yu207pKUk/edit?usp=sharing Intro Slides] [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-01-18 Lab 01: intro to R]
 
| Jan 18 || Introduction. Quantitative linguistic research and data types. R basics || [https://docs.google.com/presentation/d/1VUIUa3Db5n4dsD_HeA3e-mz55zK8uPrko3yu207pKUk/edit?usp=sharing Intro Slides] [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-01-18 Lab 01: intro to R]
 
|-
 
|-
| Jan 25 || Hypothesis testing. Binomial test. R: dataframes, tydyverse || [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-01-25 Lab 02] [https://datacamp-community-prod.s3.amazonaws.com/e63a8f6b-2aa3-4006-89e0-badc294b179c tidyverse cheat sheet]
+
| Jan 25 || Hypothesis testing. Binomial test. R: dataframes || [https://rpubs.com/ilyaschurov/rdataframes2020theo lab02]
 
|-
 
|-
| Feb 1 || Central limit theorem. Variance. Student's t-test. R: simulating data, boxplots, density plots, binomial test, t-test ||
+
| Feb 1  
[https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-02-01 Lab 03: ]
+
| Estimating of population mean. Central Limit Theorem
[https://raw.githubusercontent.com/LingData2019/LingData2020/master/seminars/2020-02-01/Lab3-ttest-binom-matrices.Rmd Rmd] [https://htmlpreview.github.io/?https://github.com/LingData2019/LingData2020/blob/master/seminars/2020-02-01/Lab3-ttest-binom-matrices.html html] [https://rforpublichealth.blogspot.com/2014/02/ggplot2-cheatsheet-for-visualizing.html Viz. distributions]
+
|  
 
|-
 
|-
| Feb 8 || Two-sample t-test. Paired t-test. Confidence intervals. <!-- TODO: Non-parametric tests --> || [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-02-08 Lab 04: ] [https://raw.githubusercontent.com/LingData2019/LingData2020/master/seminars/2020-02-08/Lab4-confint-pairedttest-anova.Rmd Rmd] [https://github.com/LingData2019/LingData2020/raw/master/seminars/2020-02-08/Lab4-confint-pairedttest-anova.pdf pdf][https://agricolamz.github.io/2018-MAG_R_course/Lec_4_stats.html CI slides] [https://istats.shinyapps.io/ExploreCoverage/ CI demo]
+
| Feb 8
 +
| One sample t-test. Working with dataframes. Selection by condition.
 +
| [https://www.r-bloggers.com/select-operations-on-r-data-frames/ selection by condition], [http://www.instantr.com/2012/12/29/performing-a-one-sample-t-test-in-r/ one sample t-test]
 
|-
 
|-
| Feb 15 || ANOVA. Correlations || [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-02-15 Lab 05:] [Rmd] [pdf]
+
| Feb 15
|-
+
| Two sample t-test. Usage of <code>t.test</code> to perform two-sample t.test.
| Feb 22 || Tests for categorial data. Chi-squared test. Fisher exact test. Effect size || [https://lindeloev.github.io/tests-as-linear/linear_tests_cheat_sheet.pdf Common statistical tests & linear models ]
+
| [https://rpubs.com/ilyaschurov/ttest2-2020 notebook]
|-
+
|}
| Feb 29 || Linear regression. Multivariate linear regression. Dummy variables ||
+
 
|-
+
== Homeworks ==
| || Dimensionality reduction. PCA. MDS. t-SNE ||
+
{|class='wikitable'
|-
+
! id !! links !! due date !! upload link
|  || CA, MCA. Clusterization ||
 
|-
 
|  || Logistic regression. Model selection ||
 
|-
 
|  || Fixed and random effects. Linear mixed-effects models ||
 
|-
 
|  || Bootstrap. Decision trees. Decision forests ||
 
|-
 
|  || Bayesian statistics ||
 
 
|-
 
|-
| || Bayesian statistics II ||  
+
| HW1
 +
| [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW1-theo.Rmd Rmd], [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW1-theo.pdf pdf]
 +
| Feb. 9, 23:59:59
 +
| [https://www.dropbox.com/request/hblqeftXqVpJLj0miQwd here]
 
|-
 
|-
 +
| HW2
 +
| [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW2-theo.Rmd Rmd], [https://github.com/LingData2019/LingData2020/blob/master/hw/hw-pdf/LingData-HW2-theo.pdf pdf]
 +
| Feb. 24, 23:59:59
 +
| [https://www.dropbox.com/request/TTMCHN0AOZ1MONaPjuav here]
 
|}
 
|}
  
Строка 50: Строка 50:
  
 
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
 
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
 
 
== Homeworks ==
 
* Homework 1 (deadline: February 16, 23:59), Chapters 1, 2, 3, and 5 of the [https://www.datacamp.com/courses/free-introduction-to-r DataCamp] course "Introduction to R". Please fill in this [https://docs.google.com/forms/d/e/1FAIpQLSdjgKBM5JSo6D6ajhrWWfFG1ktcKgDfbdK_jQ_ZbW9GwNLzpQ/viewform form]. 
 
* Homework 2 (deadline: February 23, 23:59), Chapters 4 and 6 of the [https://www.datacamp.com/courses/free-introduction-to-r DataCamp] course "Introduction to R". 
 
After completing the course please provide either the [https://support.datacamp.com/hc/en-us/articles/360001548814-How-can-I-share-my-certificate-Statement-of-Accomplishment- Statement of Accomplishment] or a screenshot of your learning progress via [link TBA]. 
 
Deadlines for Homework 1 and 2 are cancelled due to unavailability of the free version of the datacamp online course. Stay tuned!
 
* Homework 3 (deadline: February 9, 12:00), Hypothesis testing, binomial test, t-test. [https://github.com/LingData2019/LingData2020/blob/master/hw/hw-pdf/LingData-HW3-comp.pdf HW3 pdf] [https://htmlpreview.github.io/?https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW3-comp.html html] [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW3-comp.Rmd Rmd template]
 
* Homework 4 (deadline: February 29, 12:00), T-test and ANOVA, reproducing some results from Leivada & Westergaard 2019 [https://github.com/LingData2019/LingData2020/blob/master/hw/hw-pdf/LingData-HW4-comp.pdf HW4 pdf] [https://htmlpreview.github.io/?https://github.com/LingData2019/LingData2020/blob/master/hw/hw-html/LingData-HW4-comp.html html] [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW4-comp.Rmd Rmd template]
 
* Homework 5
 
 
== Final project ==
 
* Projects description [https://github.com/LingData2019/LingData2020/blob/master/projects.pdf link] 
 
* Projects pre-registration: link to submit your file TBA 
 
* Final versions of project papers: link to sumbit your files TBA 
 
 
 
== References ==
 
* Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. [http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=604318 HSE library link]
 
* Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. [http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1093048 HSE library link]
 
* Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. [http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf pdf]
 
 
* Gries, Stefan (2017). Quantitative Corpus Linguistics with R : A Practical Introduction (Vol. Second edition). Milton Park, Abingdon, Oxon: Routledge. eBook
 
* Empirical Bayes
 
* Harney, H. L. (2016). Bayesian Inference : Data Evaluation and Decisions (Vol. 2nd ed). Springer. eBook 
 
* McElreath, R. (2016). Statistical Rethinking : A Bayesian Course with Examples in R and Stan. eBook
 
* ggplot2
 
* Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. Springer. eBook
 
* R markdown [https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf Rmd Cheat Sheet
 
 
== Course Info ==
 
 
This page contains the materials of the course "Linguistic Data: Quantitative Analysis and Visualisation", taught at the HSE Master's program "Computational Linguistics" in 2019-2020 academic year. Modules: 3-4.
 

Версия 21:44, 17 февраля 2020

  • Instructors: Ilya Schurov and Olga Lyashevskaya

Materials

Data Topics Links
Jan 18 Introduction. Quantitative linguistic research and data types. R basics Intro Slides Lab 01: intro to R
Jan 25 Hypothesis testing. Binomial test. R: dataframes lab02
Feb 1 Estimating of population mean. Central Limit Theorem
Feb 8 One sample t-test. Working with dataframes. Selection by condition. selection by condition, one sample t-test
Feb 15 Two sample t-test. Usage of t.test to perform two-sample t.test. notebook

Homeworks

id links due date upload link
HW1 Rmd, pdf Feb. 9, 23:59:59 here
HW2 Rmd, pdf Feb. 24, 23:59:59 here

Software

During this course we will use R as a programming language and RStudio as a GUI.

How to install R and RStudio?

1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.

2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.

It is possible avoid installing anything on your PC, using rstudio.cloud (an online version of RStudio).

For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).