Разница между страницами «Заглавная страница» и «Data Analysis in the Social Sciences»

Материал из MathINFO
(Различия между страницами)
Перейти к навигации Перейти к поиску
 
(Новая страница: «==Course info== Dear students, Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politic...»)
 
Строка 1: Строка 1:
__NOTOC__
+
==Course info==
 +
Dear students,
  
На этом сайте вы найдете материалы по следующим курсам:
+
Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politics. Economics. Philosophy." in 2018-2019 academic year.
  
==Факультет социальных наук, ОП «Политология»==
+
* Instructor: Alla Tambovtseva
* [[Математика и статистика, часть 1 ]] (1 курс)
+
 
* [[Математика и статистика, часть 2 ]] (1 курс)
+
* Modules: 2-4
* [[Теория игр ]] (3 курс)
+
 
* [[Основы программирования в Python ]] (3 курс)
+
* Course syllabus: [https://www.hse.ru/data/2018/10/23/1150104329/program-2168603471-1W1RwbZ1sf.pdf link]
* [[Основы программирования в R ]] (3 курс)
+
 
* [[Программирование для всех ]] (магистратура, 1 курс)
+
==Software==
* [[Data Analysis in the Social Sciences ]] (PEP)
+
During this course we will use R as a programming language and RStudio as a GUI.
* [[Математические модели политической экономики ]] (3 курс)
+
 
==Факультет социальных наук, ОП «Психология»==
+
'''How to install R and RStudio?'''
* [[Математические и статистические методы в психологии ]] (1 курс)
+
 
==Школа лингвистики==
+
1. Download [https://ftp.acc.umu.se/mirror/CRAN/ R] (you can choose another mirror [https://cran.r-project.org/mirrors.html here] if you wish) and install it on your computer. Make sure you did it before installing RStudio.
* [[Дискретная математика для лингвистов ]] (1 курс)
+
 
* [[Математический анализ и линейная алгебра ]] (2 курс)
+
2. Download [https://www.rstudio.com/products/rstudio/download/ RStudio] (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.
* [[Теория вероятностей и математическая статистика ]] (2 курс)
+
 
* [[Linguistic Data: Quantitative Analysis and Visualisation for computer linguists]]
+
'''How to use RStudio?'''
* [[Linguistic Data: Quantitative Analysis and Visualisation for theoretical linguists]]
+
 
==Факультет коммуникаций, медиа и дизайна==
+
Read the instruction [http://math-info.hse.ru/f/2018-19/pep/rstudio-instruction-en.pdf here].
* [[Алгебра и анализ ]] (1 курс)
+
 
* [[Программирование для дата-журналистики]]
+
For successful submission of assignments you should be able to create and save R code files (.R). However, it would be helpful for your own research projects to learn how to create RMarkdown files.
* [[Программирование для анализа данных]]
+
 
* [[Основы прикладной математики и информатики]]
+
==Materials==
==Совместный бакалавриат ВШЭ-РЭШ==
+
{| class="wikitable"
* [[Математический анализ — 1]]
+
! Date
* [[Науки о данных]]
+
! Topic
* [[Линейная алгебра]]
+
! Theory
==Повышение квалификации==
+
! R
* [[Python для сбора и анализа данных ]] (ЦПК, Москва)
+
! Optional
* [[Python для сбора и анализа данных СПб ]] (НИУ ВШЭ Санкт-Петербург)
+
 
==Математический факультет==
+
|-
* [http://wiki.cs.hse.ru/%D0%9C%D0%B0%D1%88%D0%B8%D0%BD%D0%BD%D0%BE%D0%B5_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D0%BD%D0%B0_%D0%BC%D0%B0%D1%82%D1%84%D0%B0%D0%BA%D0%B5_2018/2019 Машинное обучение ] (на вики-сайте ФКН).
+
| 01 November
==Факультет компьютерных наук==
+
| Data collection-1. Population and samples
* [[Дифференциальные уравнения ]] (ФКН)
+
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture1.pdf lecture1]
* [[Теория игр (факультатив на ФКН)]]
+
| [http://rpubs.com/AllaT/dass-intro r-intro]
 +
| RMarkdown: official [https://rmarkdown.rstudio.com/ page], [https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf cheatsheet]<br>
 +
|-
 +
| 08 November
 +
| Data collection-2. Sampling. Sources of bias
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture2.pdf lecture2]
 +
| [http://rpubs.com/AllaT/dass-rtypes r-types] [http://rpubs.com/AllaT/dass-rvectors r-vectors]
 +
| <br>
 +
|-
 +
| 15 November
 +
| Data types. Intro to exploratory analysis
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture3.pdf lecture3]
 +
| [http://rpubs.com/AllaT/dass-dataload r-dataload] [http://math-info.hse.ru/f/2018-19/pep/r/Titanic.csv Titanic.csv]
 +
| [http://rpubs.com/AllaT/dass-csv_add csv in R] [https://drive.google.com/file/d/1-TgKv3TItRz1zDGVxwki0Mv04kk7K_x5/view?usp=sharing files]<br>
 +
|-
 +
| 22 November
 +
| Exploratory analysis. Data visualisation
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture4.pdf lecture4]
 +
| [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv] [https://www.rdocumentation.org/packages/car/versions/2.1-6/topics/Chile codebook] [http://rpubs.com/AllaT/dass-explore1 r-explore]
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/descriptives.pdf sample quartiles]<br>
 +
|-
 +
| 29 November
 +
| Exploratory analysis
 +
| R only
 +
| [http://rpubs.com/AllaT/dass-rtables r-tables] [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv] [http://rpubs.com/AllaT/dass-rnorm r-rnorm]
 +
| [http://math-info.hse.ru/f/2018-19/pep/r/wcloud.png wordcloud] [http://math-info.hse.ru/f/2018-19/pep/r/HW1_wordcloud.R code]<br>
 +
|-
 +
| 10 January
 +
| Statistical estimates. Statistical laws
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture6.pdf lecture6]
 +
| [http://rpubs.com/AllaT/dass-rloops r-loops] [http://rpubs.com/AllaT/dass-laws r-laws]
 +
| <br>
 +
|-
 +
| 17 January
 +
| Confidence intervals
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture7.pdf lecture7]
 +
| [http://rpubs.com/AllaT/dass-conf-ints r-conf-ints] [http://math-info.hse.ru/f/2017-18/ps-ms/Chile.csv Chile.csv]
 +
| [https://rpsychologist.com/d3/CI/ visualization] by K.Magnusson<br>
 +
|-
 +
| 24 January
 +
| Hypotheses testing
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture8.pdf lecture8]
 +
| [http://rpubs.com/AllaT/dass-ttest t-test]
 +
| <br>
 +
|-
 +
| 31 January
 +
| Data manipulation with dplyr. Correlation analysis
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture9.pdf lecture9]
 +
| [http://rpubs.com/AllaT/dass-dplyr r-dplyr] [http://rpubs.com/AllaT/dass-corr-ex r-corr] [http://math-info.hse.ru/f/2018-19/comm-math/marketing.csv marketing.csv]
 +
| [https://dplyr.tidyverse.org/articles/dplyr.html more] on dplyr<br>
 +
|-
 +
| 07 February
 +
| Contingency tables and chi-squared test
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture10.pdf lecture10]
 +
| [http://rpubs.com/AllaT/dass-lab1 Lab1] [http://rpubs.com/AllaT/dass-lab1-sol L1-solutions] [http://math-info.hse.ru/f/2018-19/pep/hw/CPDS.csv CPDS.csv]<br>[http://math-info.hse.ru/f/2018-19/pep/socling.csv socling.csv]<br><br>
 +
| [https://cran.r-project.org/web/packages/stringi/stringi.pdf stringi]: library for text handling<br>
 +
|-
 +
| 14 February
 +
| Visualising association between variables
 +
| R only
 +
| [http://rpubs.com/AllaT/dass-visualize r-visualisation] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv]<br>[http://rpubs.com/AllaT/dass-lab2 Lab2] [http://rpubs.com/AllaT/dass-lab2-sol L2-solutions]<br><br>
 +
| [https://www.statmethods.net/graphs/scatterplot.html more] on scatterplots<br>[http://guessthecorrelation.com/ guess correlation game]<br><br>
 +
|-
 +
| 21 February
 +
| Visualisation with ggplot2
 +
| R only
 +
| [http://math-info.hse.ru/f/2018-19/pep/r/pep-ggplot2.R r-ggplot2] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv]<br>[http://rpubs.com/AllaT/dass-lab3 Lab3] [http://rpubs.com/AllaT/dass-lab3-sol L3-solutions] [http://math-info.hse.ru/f/2018-19/pep/demography.csv demography.csv]<br><br>
 +
| [https://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf types] of visualisation, funny [https://www.sisense.com/blog/quiz-chart/ quiz] on graphs<br>interactive [https://www.gapminder.org/tools/#$chart-type=bubbles bubble plot] for inspiration<br><br>
 +
|-
 +
| 28 February
 +
| Exporting output via stargazer
 +
| R only
 +
| <br>
 +
| [https://www.princeton.edu/~otorres/NiceOutputR.pdf stargazer for non-LaTeX users]<br>
 +
|-
 +
| 7 March
 +
| Comparing multiple groups: ANOVA
 +
| [lecture11]
 +
| <br>
 +
| <br>
 +
|-
 +
| 21 March
 +
| '''Midterm'''
 +
|
 +
| <br>
 +
| <br>
 +
|-
 +
| 04 April
 +
| Simple linear regression. OLS
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture12.pdf lecture12]
 +
| [http://rpubs.com/AllaT/dass-reg-1 r-reg1] [http://math-info.hse.ru/f/2016-17/ps-pep-quant/datareg2011.csv 2011.csv]<br>[http://rpubs.com/AllaT/dass-lab4 Lab 4] [http://rpubs.com/AllaT/dass-lab4-sol L4-solutions] <br><br>
 +
| <br>
 +
|-
 +
| 18 April
 +
| Multiple linear regression
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture13.pdf lecture13]
 +
| [http://rpubs.com/AllaT/dass-mlr r-reg2] [http://math-info.hse.ru/f/2018-19/pep/flats.csv flats.csv]<br>[http://rpubs.com/AllaT/dass-lab5 Lab 5] [http://rpubs.com/AllaT/500560 L5-solutions] [https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Griliches.csv Griliches.csv] <br><br>
 +
| [https://cran.r-project.org/web/packages/jtools/vignettes/summ.html jtools] for regression<br>
 +
|-
 +
| 25 April
 +
| Multiple linear regression. Model diagnostics<br>
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture14.pdf lecture14]
 +
| [http://rpubs.com/AllaT/lm-diag r-reg3] [https://raw.githubusercontent.com/allatambov/cluster-analysis/master/clust1/wgi_fh.csv wgi_fh.csv]
 +
| <br>
 +
|-
 +
| 23 May
 +
| Categorical predictors. Interaction effects<br>
 +
| [http://math-info.hse.ru/f/2018-19/pep/lectures/lecture15.pdf lecture15]
 +
| [http://rpubs.com/AllaT/lm-cat r-reg4] [http://math-info.hse.ru/f/2018-19/pep/wgi-new.csv wgi-new.csv] [http://math-info.hse.ru/f/2018-19/pep/interactions.R r-reg5] [http://math-info.hse.ru/f/2018-19/pep/flats.csv flats.csv]<br>
 +
| <br>
 +
|-
 +
| 30 May
 +
| Fixed and random effects<br>
 +
|
 +
| [http://math-info.hse.ru/f/2018-19/pep/firms2.csv firms.csv]<br>
 +
| [https://www.princeton.edu/~otorres/Panel101R.pdf Princeton handbook] on FE & RE models<br>
 +
|-
 +
| 06 June
 +
| Lab on regressions. Logistic regression<br>
 +
|
 +
| [http://rpubs.com/AllaT/reglab Lab 6] [http://rpubs.com/AllaT/reglab-sol L6-solutions] [http://rpubs.com/AllaT/dass-logit logistic-reg] [http://math-info.hse.ru/f/2016-17/ps-pep-quant/spanish_data.csv spanish.csv]<br>
 +
| [https://stats.idre.ucla.edu/r/dae/logit-regression/ UCLA] helper on logit models<br>
 +
|-
 +
| 13 June
 +
| Principal component analysis<br>
 +
|
 +
| [http://rpubs.com/AllaT/dass-pca PCA] [https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USJudgeRatings.csv USJudges.csv]<br>
 +
| [http://math-info.hse.ru/f/2015-16/ling-mag-quant/lecture-pca.html visualisation (text in Russian)]<br>
 +
|}
 +
==R lectures in pdf==
 +
01 November: [http://math-info.hse.ru/f/2018-19/pep/r/intro-rmd.pdf r-intro],
 +
08 November: [http://math-info.hse.ru/f/2018-19/pep/r/dass-types.pdf r-types], [http://math-info.hse.ru/f/2018-19/pep/r/dass-vectors.pdf r-vectors],  
 +
15 November: [http://math-info.hse.ru/f/2018-19/pep/r/lect-dataload.pdf r-dataload] [http://math-info.hse.ru/f/2018-19/pep/r/csv-add.pdf csv-add],
 +
22 November: [http://math-info.hse.ru/f/2018-19/pep/r/lect-explore1.pdf r-explore1],
 +
29 November: [http://math-info.hse.ru/f/2018-19/pep/r/dass-tables.pdf r-tables], [http://math-info.hse.ru/f/2018-19/pep/r/dass-rnorm.pdf r-rnorm]
 +
12 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-loops.pdf r-loops], [http://math-info.hse.ru/f/2018-19/pep/r/r-laws.pdf r-laws],
 +
17 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-conf-ints.pdf r-conf-ints], 24 January: [http://math-info.hse.ru/f/2018-19/pep/r/t-test.pdf t-test], 31 January: [http://math-info.hse.ru/f/2018-19/pep/r/r-dplyr.pdf r-dplyr], [http://math-info.hse.ru/f/2018-19/pep/r/r-corr.pdf r-corr],
 +
14 February: [http://math-info.hse.ru/f/2018-19/pep/r/r-visualisation.pdf r-visualisation]
 +
 
 +
==Home assignments==
 +
* [http://math-info.hse.ru/f/2018-19/pep/hw/hw1.pdf Homework 1] (deadline: 18 November, 23:59)
 +
 
 +
* [http://math-info.hse.ru/f/2018-19/pep/hw/hw2.pdf Homework 2] (deadline: 20 December, 23:59)
 +
 
 +
* [http://math-info.hse.ru/f/2018-19/pep/hw/hw3.pdf Homework 3] (deadline: 04 February, 23:59)
 +
 
 +
* [http://math-info.hse.ru/f/2018-19/pep/hw/hw4.pdf Homework 4] (deadline: 18 February, 23:59)
 +
 
 +
* [http://math-info.hse.ru/f/2018-19/pep/hw/hw5 Homework 5] (deadline: 27 April, 23:59), [https://docs.google.com/forms/d/e/1FAIpQLSf4Kgeg3d98jfkpmtQ8lMsAJ8CuAq6VPe4HUXbwFRHGnNMUAw/viewform link] to submit
 +
 
 +
==Readings==
 +
We will use two books as compulsory for this course:
 +
 
 +
* D.Diez et al. OpenIntro Statistics. 2015. (freely&legally [https://www.openintro.org/stat/textbook.php available] online)
 +
 
 +
* Ch.Weelan. Naked statistics. 2013.

Текущая версия на 04:07, 7 февраля 2020

Course info

Dear students,

Here will be published the materials of the course "Data Analysis in the Social Sciences", taught at the Master programme "Politics. Economics. Philosophy." in 2018-2019 academic year.

  • Instructor: Alla Tambovtseva
  • Modules: 2-4
  • Course syllabus: link

Software

During this course we will use R as a programming language and RStudio as a GUI.

How to install R and RStudio?

1. Download R (you can choose another mirror here if you wish) and install it on your computer. Make sure you did it before installing RStudio.

2. Download RStudio (you need RStudio Desktop Open Source License) and install it on your computer. It is recommended to create a shortcut for RStudio during installation.

How to use RStudio?

Read the instruction here.

For successful submission of assignments you should be able to create and save R code files (.R). However, it would be helpful for your own research projects to learn how to create RMarkdown files.

Materials

Date Topic Theory R Optional
01 November Data collection-1. Population and samples lecture1 r-intro RMarkdown: official page, cheatsheet
08 November Data collection-2. Sampling. Sources of bias lecture2 r-types r-vectors
15 November Data types. Intro to exploratory analysis lecture3 r-dataload Titanic.csv csv in R files
22 November Exploratory analysis. Data visualisation lecture4 Chile.csv codebook r-explore sample quartiles
29 November Exploratory analysis R only r-tables Chile.csv r-rnorm wordcloud code
10 January Statistical estimates. Statistical laws lecture6 r-loops r-laws
17 January Confidence intervals lecture7 r-conf-ints Chile.csv visualization by K.Magnusson
24 January Hypotheses testing lecture8 t-test
31 January Data manipulation with dplyr. Correlation analysis lecture9 r-dplyr r-corr marketing.csv more on dplyr
07 February Contingency tables and chi-squared test lecture10 Lab1 L1-solutions CPDS.csv
socling.csv

stringi: library for text handling
14 February Visualising association between variables R only r-visualisation wgi_fh.csv
Lab2 L2-solutions

more on scatterplots
guess correlation game

21 February Visualisation with ggplot2 R only r-ggplot2 wgi_fh.csv
Lab3 L3-solutions demography.csv

types of visualisation, funny quiz on graphs
interactive bubble plot for inspiration

28 February Exporting output via stargazer R only
stargazer for non-LaTeX users
7 March Comparing multiple groups: ANOVA [lecture11]

21 March Midterm

04 April Simple linear regression. OLS lecture12 r-reg1 2011.csv
Lab 4 L4-solutions


18 April Multiple linear regression lecture13 r-reg2 flats.csv
Lab 5 L5-solutions Griliches.csv

jtools for regression
25 April Multiple linear regression. Model diagnostics
lecture14 r-reg3 wgi_fh.csv
23 May Categorical predictors. Interaction effects
lecture15 r-reg4 wgi-new.csv r-reg5 flats.csv

30 May Fixed and random effects
firms.csv
Princeton handbook on FE & RE models
06 June Lab on regressions. Logistic regression
Lab 6 L6-solutions logistic-reg spanish.csv
UCLA helper on logit models
13 June Principal component analysis
PCA USJudges.csv
visualisation (text in Russian)

R lectures in pdf

01 November: r-intro, 08 November: r-types, r-vectors, 15 November: r-dataload csv-add, 22 November: r-explore1, 29 November: r-tables, r-rnorm 12 January: r-loops, r-laws, 17 January: r-conf-ints, 24 January: t-test, 31 January: r-dplyr, r-corr, 14 February: r-visualisation

Home assignments

Readings

We will use two books as compulsory for this course:

  • D.Diez et al. OpenIntro Statistics. 2015. (freely&legally available online)
  • Ch.Weelan. Naked statistics. 2013.