Разница между страницами «Заглавная страница» и «Data Analysis in the Social Sciences»
(Новая страница: «==Course info== Dear students, Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politic...») |
|||
Строка 1: | Строка 1: | ||
− | + | ==Course info== | |
+ | Dear students, | ||
− | + | Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politics. Economics. Philosophy." in 2018-2019 academic year. | |
− | == | + | * Instructor: Alla Tambovtseva |
− | + | ||
− | + | * Modules: 2-4 | |
− | + | ||
− | + | * Course syllabus: [https://www.hse.ru/data/2018/10/23/1150104329/program-2168603471-1W1RwbZ1sf.pdf link] | |
− | + | ||
− | + | ==Software== | |
− | + | During this course we will use R as a programming language and RStudio as a GUI. | |
− | + | ||
− | + | '''How to install R and RStudio?''' | |
− | + | ||
− | + | 1. Download [https://ftp.acc.umu.se/mirror/CRAN/ R] (you can choose another mirror [https://cran.r-project.org/mirrors.html here] if you wish) and install it on your computer. Make sure you did it before installing RStudio. | |
− | + | ||
− | + | 2. Download [https://www.rstudio.com/products/rstudio/download/ RStudio] (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation. | |
− | + | ||
− | + | '''How to use RStudio?''' | |
− | + | ||
− | + | Read the instruction [http://math-info.hse.ru/f/2018-19/pep/rstudio-instruction-en.pdf here]. | |
− | + | ||
− | + | For successful submission of assignments you should be able to create and save R code files (.R). However, it would be helpful for your own research projects to learn how to create RMarkdown files. | |
− | + | ||
− | + | ==Materials== | |
− | + | {| class="wikitable" | |
− | + | ! Date | |
− | + | ! Topic | |
− | + | ! Theory | |
− | == | + | ! R |
− | + | ! Optional | |
− | + | ||
− | + | |- | |
− | + | | 01 November | |
− | == | + | | Data collection-1. Population and samples |
− | * [[ | + | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture1.pdf lecture1] |
− | * [[ | + | | [http://rpubs.com/AllaT/dass-intro r-intro] |
+ | | RMarkdown: official [https://rmarkdown.rstudio.com/ page], [https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf cheatsheet]<br> | ||
+ | |- | ||
+ | | 08 November | ||
+ | | Data collection-2. Sampling. Sources of bias | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture2.pdf lecture2] | ||
+ | | [http://rpubs.com/AllaT/dass-rtypes r-types] [http://rpubs.com/AllaT/dass-rvectors r-vectors] | ||
+ | | <br> | ||
+ | |- | ||
+ | | 15 November | ||
+ | | Data types. Intro to exploratory analysis | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture3.pdf lecture3] | ||
+ | | [http://rpubs.com/AllaT/dass-dataload r-dataload] [http://math-info.hse.ru/f/2018-19/pep/r/Titanic.csv Titanic.csv] | ||
+ | | [http://rpubs.com/AllaT/dass-csv_add csv in R] [https://drive.google.com/file/d/1-TgKv3TItRz1zDGVxwki0Mv04kk7K_x5/view?usp=sharing files]<br> | ||
+ | |- | ||
+ | | 22 November | ||
+ | | Exploratory analysis. Data visualisation | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture4.pdf lecture4] | ||
+ | | [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv] [https://www.rdocumentation.org/packages/car/versions/2.1-6/topics/Chile codebook] [http://rpubs.com/AllaT/dass-explore1 r-explore] | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/descriptives.pdf sample quartiles]<br> | ||
+ | |- | ||
+ | | 29 November | ||
+ | | Exploratory analysis | ||
+ | | R only | ||
+ | | [http://rpubs.com/AllaT/dass-rtables r-tables] [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv] [http://rpubs.com/AllaT/dass-rnorm r-rnorm] | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/r/wcloud.png wordcloud] [http://math-info.hse.ru/f/2018-19/pep/r/HW1_wordcloud.R code]<br> | ||
+ | |- | ||
+ | | 10 January | ||
+ | | Statistical estimates. Statistical laws | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture6.pdf lecture6] | ||
+ | | [http://rpubs.com/AllaT/dass-rloops r-loops] [http://rpubs.com/AllaT/dass-laws r-laws] | ||
+ | | <br> | ||
+ | |- | ||
+ | | 17 January | ||
+ | | Confidence intervals | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture7.pdf lecture7] | ||
+ | | [http://rpubs.com/AllaT/dass-conf-ints r-conf-ints] [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv] | ||
+ | | [https://rpsychologist.com/d3/CI/ visualization] by K.Magnusson<br> | ||
+ | |- | ||
+ | | 24 January | ||
+ | | Hypotheses testing | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture8.pdf lecture8] | ||
+ | | [http://rpubs.com/AllaT/dass-ttest t-test] | ||
+ | | <br> | ||
+ | |- | ||
+ | | 31 January | ||
+ | | Data manipulation with dplyr. Correlation analysis | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture9.pdf lecture9] | ||
+ | | [http://rpubs.com/AllaT/dass-dplyr r-dplyr] [http://rpubs.com/AllaT/dass-corr-ex r-corr] [http://math-info.hse.ru/f/2018-19/comm-math/marketing.csv marketing.csv] | ||
+ | | [https://dplyr.tidyverse.org/articles/dplyr.html more] on dplyr<br> | ||
+ | |- | ||
+ | | 07 February | ||
+ | | Contingency tables and chi-squared test | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture10.pdf lecture10] | ||
+ | | [http://rpubs.com/AllaT/dass-lab1 Lab1] [http://rpubs.com/AllaT/dass-lab1-sol L1-solutions] [http://math-info.hse.ru/f/2018-19/pep/hw/CPDS.csv CPDS.csv]<br>[http://math-info.hse.ru/f/2018-19/pep/socling.csv socling.csv]<br><br> | ||
+ | | [https://cran.r-project.org/web/packages/stringi/stringi.pdf stringi]: library for text handling<br> | ||
+ | |- | ||
+ | | 14 February | ||
+ | | Visualising association between variables | ||
+ | | R only | ||
+ | | [http://rpubs.com/AllaT/dass-visualize r-visualisation] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv]<br>[http://rpubs.com/AllaT/dass-lab2 Lab2] [http://rpubs.com/AllaT/dass-lab2-sol L2-solutions]<br><br> | ||
+ | | [https://www.statmethods.net/graphs/scatterplot.html more] on scatterplots<br>[http://guessthecorrelation.com/ guess correlation game]<br><br> | ||
+ | |- | ||
+ | | 21 February | ||
+ | | Visualisation with ggplot2 | ||
+ | | R only | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/r/pep-ggplot2.R r-ggplot2] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv]<br>[http://rpubs.com/AllaT/dass-lab3 Lab3] [http://rpubs.com/AllaT/dass-lab3-sol L3-solutions] [http://math-info.hse.ru/f/2018-19/pep/demography.csv demography.csv]<br><br> | ||
+ | | [https://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf types] of visualisation, funny [https://www.sisense.com/blog/quiz-chart/ quiz] on graphs<br>interactive [https://www.gapminder.org/tools/#$chart-type=bubbles bubble plot] for inspiration<br><br> | ||
+ | |- | ||
+ | | 28 February | ||
+ | | Exporting output via stargazer | ||
+ | | R only | ||
+ | | <br> | ||
+ | | [https://www.princeton.edu/~otorres/NiceOutputR.pdf stargazer for non-LaTeX users]<br> | ||
+ | |- | ||
+ | | 7 March | ||
+ | | Comparing multiple groups: ANOVA | ||
+ | | [lecture11] | ||
+ | | <br> | ||
+ | | <br> | ||
+ | |- | ||
+ | | 21 March | ||
+ | | '''Midterm''' | ||
+ | | | ||
+ | | <br> | ||
+ | | <br> | ||
+ | |- | ||
+ | | 04 April | ||
+ | | Simple linear regression. OLS | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture12.pdf lecture12] | ||
+ | | [http://rpubs.com/AllaT/dass-reg-1 r-reg1] [http://math-info.hse.ru/f/2016-17/ps-pep-quant/datareg2011.csv 2011.csv]<br>[http://rpubs.com/AllaT/dass-lab4 Lab 4] [http://rpubs.com/AllaT/dass-lab4-sol L4-solutions] <br><br> | ||
+ | | <br> | ||
+ | |- | ||
+ | | 18 April | ||
+ | | Multiple linear regression | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture13.pdf lecture13] | ||
+ | | [http://rpubs.com/AllaT/dass-mlr r-reg2] [http://math-info.hse.ru/f/2018-19/pep/flats.csv flats.csv]<br>[http://rpubs.com/AllaT/dass-lab5 Lab 5] [http://rpubs.com/AllaT/500560 L5-solutions] [https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Griliches.csv Griliches.csv] <br><br> | ||
+ | | [https://cran.r-project.org/web/packages/jtools/vignettes/summ.html jtools] for regression<br> | ||
+ | |- | ||
+ | | 25 April | ||
+ | | Multiple linear regression. Model diagnostics<br> | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture14.pdf lecture14] | ||
+ | | [http://rpubs.com/AllaT/lm-diag r-reg3] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv] | ||
+ | | <br> | ||
+ | |- | ||
+ | | 23 May | ||
+ | | Categorical predictors. Interaction effects<br> | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture15.pdf lecture15] | ||
+ | | [http://rpubs.com/AllaT/lm-cat r-reg4] [http://math-info.hse.ru/f/2018-19/pep/wgi-new.csv wgi-new.csv] [http://math-info.hse.ru/f/2018-19/pep/interactions.R r-reg5] [http://math-info.hse.ru/f/2018-19/pep/flats.csv flats.csv]<br> | ||
+ | | <br> | ||
+ | |- | ||
+ | | 30 May | ||
+ | | Fixed and random effects<br> | ||
+ | | | ||
+ | | [http://math-info.hse.ru/f/2018-19/pep/firms2.csv firms.csv]<br> | ||
+ | | [https://www.princeton.edu/~otorres/Panel101R.pdf Princeton handbook] on FE & RE models<br> | ||
+ | |- | ||
+ | | 06 June | ||
+ | | Lab on regressions. Logistic regression<br> | ||
+ | | | ||
+ | | [http://rpubs.com/AllaT/reglab Lab 6] [http://rpubs.com/AllaT/reglab-sol L6-solutions] [http://rpubs.com/AllaT/dass-logit logistic-reg] [http://math-info.hse.ru/f/2016-17/ps-pep-quant/spanish_data.csv spanish.csv]<br> | ||
+ | | [https://stats.idre.ucla.edu/r/dae/logit-regression/ UCLA] helper on logit models<br> | ||
+ | |- | ||
+ | | 13 June | ||
+ | | Principal component analysis<br> | ||
+ | | | ||
+ | | [http://rpubs.com/AllaT/dass-pca PCA] [https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USJudgeRatings.csv USJudges.csv]<br> | ||
+ | | [http://math-info.hse.ru/f/2015-16/ling-mag-quant/lecture-pca.html visualisation (text in Russian)]<br> | ||
+ | |} | ||
+ | ==R lectures in pdf== | ||
+ | 01 November: [http://math-info.hse.ru/f/2018-19/pep/r/intro-rmd.pdf r-intro], | ||
+ | 08 November: [http://math-info.hse.ru/f/2018-19/pep/r/dass-types.pdf r-types], [http://math-info.hse.ru/f/2018-19/pep/r/dass-vectors.pdf r-vectors], | ||
+ | 15 November: [http://math-info.hse.ru/f/2018-19/pep/r/lect-dataload.pdf r-dataload] [http://math-info.hse.ru/f/2018-19/pep/r/csv-add.pdf csv-add], | ||
+ | 22 November: [http://math-info.hse.ru/f/2018-19/pep/r/lect-explore1.pdf r-explore1], | ||
+ | 29 November: [http://math-info.hse.ru/f/2018-19/pep/r/dass-tables.pdf r-tables], [http://math-info.hse.ru/f/2018-19/pep/r/dass-rnorm.pdf r-rnorm] | ||
+ | 12 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-loops.pdf r-loops], [http://math-info.hse.ru/f/2018-19/pep/r/r-laws.pdf r-laws], | ||
+ | 17 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-conf-ints.pdf r-conf-ints], 24 January: [http://math-info.hse.ru/f/2018-19/pep/r/t-test.pdf t-test], 31 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-dplyr.pdf r-dplyr], [http://math-info.hse.ru/f/2018-19/pep/r/r-corr.pdf r-corr], | ||
+ | 14 February: [http://math-info.hse.ru/f/2018-19/pep/r/r-visualisation.pdf r-visualisation] | ||
+ | |||
+ | ==Home assignments== | ||
+ | * [http://math-info.hse.ru/f/2018-19/pep/hw/hw1.pdf Homework 1] (deadline: 18 November, 23:59) | ||
+ | |||
+ | * [http://math-info.hse.ru/f/2018-19/pep/hw/hw2.pdf Homework 2] (deadline: 20 December, 23:59) | ||
+ | |||
+ | * [http://math-info.hse.ru/f/2018-19/pep/hw/hw3.pdf Homework 3] (deadline: 04 February, 23:59) | ||
+ | |||
+ | * [http://math-info.hse.ru/f/2018-19/pep/hw/hw4.pdf Homework 4] (deadline: 18 February, 23:59) | ||
+ | |||
+ | * [http://math-info.hse.ru/f/2018-19/pep/hw/hw5 Homework 5] (deadline: 27 April, 23:59), [https://docs.google.com/forms/d/e/1FAIpQLSf4Kgeg3d98jfkpmtQ8lMsAJ8CuAq6VPe4HUXbwFRHGnNMUAw/viewform link] to submit | ||
+ | |||
+ | ==Readings== | ||
+ | We will use two books as compulsory for this course: | ||
+ | |||
+ | * D.Diez et al. OpenIntro Statistics. 2015. (freely&legally [https://www.openintro.org/stat/textbook.php available] online) | ||
+ | |||
+ | * Ch.Weelan. Naked statistics. 2013. |
Текущая версия на 04:07, 7 февраля 2020
Course info
Dear students,
Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politics. Economics. Philosophy." in 2018-2019 academic year.
- Instructor: Alla Tambovtseva
- Modules: 2-4
- Course syllabus: link
Software
During this course we will use R as a programming language and RStudio as a GUI.
How to install R and RStudio?
1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.
2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
How to use RStudio?
Read the instruction here.
For successful submission of assignments you should be able to create and save R code files (.R). However, it would be helpful for your own research projects to learn how to create RMarkdown files.
Materials
Date | Topic | Theory | R | Optional |
---|---|---|---|---|
01 November | Data collection-1. Population and samples | lecture1 | r-intro | RMarkdown: official page, cheatsheet |
08 November | Data collection-2. Sampling. Sources of bias | lecture2 | r-types r-vectors | |
15 November | Data types. Intro to exploratory analysis | lecture3 | r-dataload Titanic.csv | csv in R files |
22 November | Exploratory analysis. Data visualisation | lecture4 | Chile.csv codebook r-explore | sample quartiles |
29 November | Exploratory analysis | R only | r-tables Chile.csv r-rnorm | wordcloud code |
10 January | Statistical estimates. Statistical laws | lecture6 | r-loops r-laws | |
17 January | Confidence intervals | lecture7 | r-conf-ints Chile.csv | visualization by K.Magnusson |
24 January | Hypotheses testing | lecture8 | t-test | |
31 January | Data manipulation with dplyr. Correlation analysis | lecture9 | r-dplyr r-corr marketing.csv | more on dplyr |
07 February | Contingency tables and chi-squared test | lecture10 | Lab1 L1-solutions CPDS.csv socling.csv |
stringi: library for text handling |
14 February | Visualising association between variables | R only | r-visualisation wgi_fh.csv Lab2 L2-solutions |
more on scatterplots guess correlation game |
21 February | Visualisation with ggplot2 | R only | r-ggplot2 wgi_fh.csv Lab3 L3-solutions demography.csv |
types of visualisation, funny quiz on graphs interactive bubble plot for inspiration |
28 February | Exporting output via stargazer | R only | stargazer for non-LaTeX users | |
7 March | Comparing multiple groups: ANOVA | [lecture11] | ||
21 March | Midterm | |||
04 April | Simple linear regression. OLS | lecture12 | r-reg1 2011.csv Lab 4 L4-solutions |
|
18 April | Multiple linear regression | lecture13 | r-reg2 flats.csv Lab 5 L5-solutions Griliches.csv |
jtools for regression |
25 April | Multiple linear regression. Model diagnostics |
lecture14 | r-reg3 wgi_fh.csv | |
23 May | Categorical predictors. Interaction effects |
lecture15 | r-reg4 wgi-new.csv r-reg5 flats.csv |
|
30 May | Fixed and random effects |
firms.csv |
Princeton handbook on FE & RE models | |
06 June | Lab on regressions. Logistic regression |
Lab 6 L6-solutions logistic-reg spanish.csv |
UCLA helper on logit models | |
13 June | Principal component analysis |
PCA USJudges.csv |
visualisation (text in Russian) |
R lectures in pdf
01 November: r-intro, 08 November: r-types, r-vectors, 15 November: r-dataload csv-add, 22 November: r-explore1, 29 November: r-tables, r-rnorm 12 January: r-loops, r-laws, 17 January: r-conf-ints, 24 January: t-test, 31 January: r-dplyr, r-corr, 14 February: r-visualisation
Home assignments
- Homework 1 (deadline: 18 November, 23:59)
- Homework 2 (deadline: 20 December, 23:59)
- Homework 3 (deadline: 04 February, 23:59)
- Homework 4 (deadline: 18 February, 23:59)
- Homework 5 (deadline: 27 April, 23:59), link to submit
Readings
We will use two books as compulsory for this course:
- D.Diez et al. OpenIntro Statistics. 2015. (freely&legally available online)
- Ch.Weelan. Naked statistics. 2013.