# Data Analysis in the Social Sciences

Перейти к навигации Перейти к поиску

## Course data

• Instructor: Ilya V. Schurov.
• Modules: 2-4.

## Software

We use statistical software called R as a main computer tool. It's a free software and you can download it here and install on your computer. We also use RStudio: integrated development environment for R. It's free software either, you can download it here.

## Lessons

### 1. November 14. Data types and descriptive statistics

Types of data in social sciences. Nominal, ordered, interval and ratio scales. Examples. Descriptive statistics: mean, median, mode, variance, standard deviation, quantiles.

See Naked Statistics, Chapter 2, and Statistics, Chapter 4.

The basics of R. R as calculator. Vectors. Calculating basic descriptive statistics with R.

### 2. November 21. Introduction to statistical thinking

Simple probabilistic models. Population, sample. Model for opinion poll. Random variable. Expected value. The law of large numbers. Central limit theorem.

See Naked Statistics, Chapter 5 and Statistics, Chapters 16, 17 and 18.

### 3. November 28. Confidence intervals

Histograms as an approximation of distribution: area under the histogram and the probablitiy. Confidence intervals. Dependence of the confidence intervals on the size of sample and confidence level.

See Naked Statistics, Chapter 10 and Statistics, Chapters 19, 20 and 21.

Working with dataframes in R.

### 4. December 5. Calculating confidence intervals

Standard deviation and standard error of mean. Quantiles of the standard Guassian distribution. The relation between standard error and confidence interval.

See Statistics, chapters 17, 20, 21 and 23, and Naked Statistics, chapters 8 and 10.

Calculation of confidence intervals in R with `t.test` function. See here.

### 5. January 16: Statistical hypothesis and t-test

Hypothesis testing framework. Null and alternative hypothesis, p-value. Testing hypothesis on population means with t-test.

See Statistics, Chapter 26 and Naked Statistics, Chapter 9.

Using `t.test()` function in R to test the hypothesis. See short example and more detailed explanation.

### 6. January 23: ANOVA

Error types in statistics. Significance level. Multiple comparison problem. ANOVA.

See Introductory Statistics for the Behavioral Sciences, by Welkowitz, Joan, Cohen, Barry H., Lea, R. Brooke, John Wiley & Sons, Incorporated, January 2012, chapter 12 (available under HSE subscription).

Using `aov()` function in R.

### Regression models

Linear regression. Simple and multiple regression.

See Naked Statistics, chapters 11 and 12, Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, chapters 4 -- 6, A Non-Technical Introduction to Regression, by J. Bakija, Williams College, 2013.

Using `lm()` function in R.

Working with panel data. Models with fixed effects, models with random effects.

See Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, Chapter 10.

Logistic regression. Using `glm()` function in R.

## Homework

• Homework 1, due date is Sunday, December 4, 23:00. Answers to be submitted here.