Linguistic Data: Quantitative Analysis and Visualisation: linguistic theory: различия между версиями
Перейти к навигации
Перейти к поиску
(не показано 20 промежуточных версий этого же участника) | |||
Строка 8: | Строка 8: | ||
| Jan 18 || Introduction. Quantitative linguistic research and data types. R basics || [https://docs.google.com/presentation/d/1VUIUa3Db5n4dsD_HeA3e-mz55zK8uPrko3yu207pKUk/edit?usp=sharing Intro Slides] [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-01-18 Lab 01: intro to R] | | Jan 18 || Introduction. Quantitative linguistic research and data types. R basics || [https://docs.google.com/presentation/d/1VUIUa3Db5n4dsD_HeA3e-mz55zK8uPrko3yu207pKUk/edit?usp=sharing Intro Slides] [https://github.com/LingData2019/LingData2020/tree/master/seminars/2020-01-18 Lab 01: intro to R] | ||
|- | |- | ||
− | | Jan 25 || Hypothesis testing. Binomial test. R: dataframes || [https:// | + | | Jan 25 || Hypothesis testing. Binomial test. R: dataframes || [https://rpubs.com/ilyaschurov/rdataframes2020theo lab02] |
|- | |- | ||
| Feb 1 | | Feb 1 | ||
| Estimating of population mean. Central Limit Theorem | | Estimating of population mean. Central Limit Theorem | ||
− | | | + | | |
|- | |- | ||
| Feb 8 | | Feb 8 | ||
− | | One sample t-test | + | | One sample t-test. Working with dataframes. Selection by condition. |
− | + | | [https://www.r-bloggers.com/select-operations-on-r-data-frames/ selection by condition], [http://www.instantr.com/2012/12/29/performing-a-one-sample-t-test-in-r/ one sample t-test] | |
|- | |- | ||
− | | Two sample t-test | + | | Feb 15 |
− | + | | Two sample t-test. Usage of <code>t.test</code> to perform two-sample t.test. | |
+ | | [https://rpubs.com/ilyaschurov/ttest2-2020 notebook] | ||
+ | |- | ||
+ | | Feb 22 | ||
+ | | ANOVA. Confidence intervals. tidyverse library. | ||
+ | | [https://rpubs.com/ilyaschurov/ling-2020-02-22-confint-tidyverse notebook] | ||
+ | |- | ||
+ | | Feb 29 | ||
+ | | Chi-squared test. | ||
+ | | [https://raw.githubusercontent.com/LingData2019/LingData2020/master/seminars/2020-02-22/Lab6-chisq-Fischer-effectsize.Rmd Rmd], [https://github.com/LingData2019/LingData2020/blob/master/seminars/2020-02-22/Lab6-chisq-Fischer-effectsize.pdf pdf] | ||
+ | |- | ||
+ | | March 7 | ||
+ | | Correlations. Scatter plots. | ||
+ | | [https://rpubs.com/ilyaschurov/ling-2020-03-07-corr notebook] | ||
+ | |- | ||
+ | | April 8 | ||
+ | | Bivariate regression. | ||
+ | | [https://youtu.be/F-yQMC0lGYw video] | ||
+ | |- | ||
+ | | April 15 | ||
+ | | Multiple regression and causal questions. | ||
+ | | [https://youtu.be/hFLDl7rGbmk video], [http://rpubs.com/AllaT/lingdat-multreg notebook] | ||
+ | |- | ||
+ | | April 22 | ||
+ | | More on linear regressions. Significance of coefficient. Dummy variables. | ||
+ | | [https://youtu.be/9tQ3oOL1umU video] | ||
+ | |- | ||
+ | | April 29 | ||
+ | | Logistic regression | ||
+ | | [https://www.youtube.com/watch?v=RCZLL69H6PY video] | ||
+ | |- | ||
+ | | May 6 | ||
+ | | Random effects. Mixed effects models | ||
+ | | [https://youtu.be/a4hC-WCuo_I video] | ||
+ | |- | ||
+ | | May 13 | ||
+ | | Principal components anaylysis | ||
+ | | [https://youtu.be/cY0FPL5bDE4 video] | ||
+ | |- | ||
+ | | May 20 | ||
+ | | Clustering | ||
+ | | [https://youtu.be/HgKAJ6ElmHA video] | ||
+ | |- | ||
+ | | May 27 | ||
+ | | Decision trees and random forests | ||
+ | | [https://youtu.be/fa4so7wgDY8 video] | ||
|} | |} | ||
Строка 35: | Строка 80: | ||
| Feb. 24, 23:59:59 | | Feb. 24, 23:59:59 | ||
| [https://www.dropbox.com/request/TTMCHN0AOZ1MONaPjuav here] | | [https://www.dropbox.com/request/TTMCHN0AOZ1MONaPjuav here] | ||
+ | |- | ||
+ | | HW3 | ||
+ | | [https://github.com/LingData2019/LingData2020/blob/master/hw/LingData-HW3-theo.Rmd Rmd], [https://github.com/LingData2019/LingData2020/blob/master/hw/hw-pdf/LingData-HW3-theo.pdf pdf] | ||
+ | | April 16, 23:59:59 | ||
+ | | [https://www.dropbox.com/request/woSpo5Qmmk6f64OAajeB here] | ||
|} | |} | ||
+ | |||
+ | == Final projects == | ||
+ | * [https://github.com/LingData2019/LingData2020/blob/master/projects-theor.pdf Final projects description] | ||
+ | * Submit research proposal (pre-registration) [https://www.dropbox.com/request/XmjgFjFog3MYW7GnVWxj here]. | ||
+ | * Submit final papers [https://www.dropbox.com/request/OpZ8qXVTWKgQFlqdy2Jl here]. | ||
== Software == | == Software == | ||
Строка 49: | Строка 104: | ||
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd). | For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd). | ||
+ | |||
+ | == References == | ||
+ | * Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. [http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=604318 HSE library link] | ||
+ | * Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. [http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1093048 HSE library link] | ||
+ | * Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. [https://msu.edu/course/lin/875/BaayenCUPstats.pdf pdf] |
Текущая версия на 14:38, 16 июня 2020
- Instructors: Ilya Schurov and Olga Lyashevskaya
Содержание
Materials
Data | Topics | Links |
---|---|---|
Jan 18 | Introduction. Quantitative linguistic research and data types. R basics | Intro Slides Lab 01: intro to R |
Jan 25 | Hypothesis testing. Binomial test. R: dataframes | lab02 |
Feb 1 | Estimating of population mean. Central Limit Theorem | |
Feb 8 | One sample t-test. Working with dataframes. Selection by condition. | selection by condition, one sample t-test |
Feb 15 | Two sample t-test. Usage of t.test to perform two-sample t.test.
|
notebook |
Feb 22 | ANOVA. Confidence intervals. tidyverse library. | notebook |
Feb 29 | Chi-squared test. | Rmd, pdf |
March 7 | Correlations. Scatter plots. | notebook |
April 8 | Bivariate regression. | video |
April 15 | Multiple regression and causal questions. | video, notebook |
April 22 | More on linear regressions. Significance of coefficient. Dummy variables. | video |
April 29 | Logistic regression | video |
May 6 | Random effects. Mixed effects models | video |
May 13 | Principal components anaylysis | video |
May 20 | Clustering | video |
May 27 | Decision trees and random forests | video |
Homeworks
id | links | due date | upload link |
---|---|---|---|
HW1 | Rmd, pdf | Feb. 9, 23:59:59 | here |
HW2 | Rmd, pdf | Feb. 24, 23:59:59 | here |
HW3 | Rmd, pdf | April 16, 23:59:59 | here |
Final projects
- Final projects description
- Submit research proposal (pre-registration) here.
- Submit final papers here.
Software
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
It is possible avoid installing anything on your PC, using rstudio.cloud (an online version of RStudio).
For successful submission of assignments you should be able to create and save R code files (.R) and RMarkdown files (.Rmd).
References
- Gries, Stefan (2013). Statistics for Linguistics with R : A Practical Introduction (Vol. 2nd revised edition). Berlin: De Gruyter Mouton. HSE library link
- Levshina, Natalia (2015). How to Do Linguistics with R : Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. HSE library link
- Baayen, Harald (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge UP. pdf