Data Analysis in the Social Sciences
- Instructor: Ilya V. Schurov.
- Modules: 2-4.
We use statistical software called R as a main computer tool. It's a free software and you can download it here and install on your computer. We also use RStudio: integrated development environment for R. It's free software either, you can download it here.
1. November 14. Data types and descriptive statistics
Types of data in social sciences. Nominal, ordered, interval and ratio scales. Examples. Descriptive statistics: mean, median, mode, variance, standard deviation, quantiles.
See Naked Statistics, Chapter 2, and Statistics, Chapter 4.
The basics of R. R as calculator. Vectors. Calculating basic descriptive statistics with R.
2. November 21. Introduction to statistical thinking
Simple probabilistic models. Population, sample. Model for opinion poll. Random variable. Expected value. The law of large numbers. Central limit theorem.
See Naked Statistics, Chapter 5 and Statistics, Chapters 16, 17 and 18.
3. November 28. Confidence intervals
Histograms as an approximation of distribution: area under the histogram and the probablitiy. Confidence intervals. Dependence of the confidence intervals on the size of sample and confidence level.
See Naked Statistics, Chapter 10 and Statistics, Chapters 19, 20 and 21.
Working with dataframes in R.
4. December 5. Calculating confidence intervals
Standard deviation and standard error of mean. Quantiles of the standard Guassian distribution. The relation between standard error and confidence interval.
See Statistics, chapters 17, 20, 21 and 23, and Naked Statistics, chapters 8 and 10.
Calculation of confidence intervals in R with
t.test function. See here.
5. January 16: Statistical hypothesis and t-test
Hypothesis testing framework. Null and alternative hypothesis, p-value. Testing hypothesis on population means with t-test.
See Statistics, Chapter 26 and Naked Statistics, Chapter 9.
6. January 23: ANOVA
Error types in statistics. Significance level. Multiple comparison problem. ANOVA.
See Introductory Statistics for the Behavioral Sciences, by Welkowitz, Joan, Cohen, Barry H., Lea, R. Brooke, John Wiley & Sons, Incorporated, January 2012, chapter 12 (available under HSE subscription).
aov() function in R.
Linear regression. Simple and multiple regression.
See Naked Statistics, chapters 11 and 12, Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, chapters 4 -- 6, A Non-Technical Introduction to Regression, by J. Bakija, Williams College, 2013.
lm() function in R.
Working with panel data. Models with fixed effects, models with random effects.
See Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, Chapter 10.
Logistic regression. Using
glm() function in R.
March 13. In-class work
March 20. In-class work
- Homework 1, due date is Sunday, December 4, 23:00. Answers to be submitted here.
- Homework 2 (copy), due date is Sunday, February 5, 23:00. Answers to be submitted here.
- Homework 3 copy, due date is Sunday, March 5. Answers to be submitted here.
- Homework 4 copy , due date May 21, answers to be submitted here.
- Homework 5, due date June 16. Answers to be submitted here.
I also like Statistics by David Freedman, Robert Pisani and Roger Purves (4th edition, 1998) as an introductory textbook on mathematical statistics.
To proceed with more sophisticated econometrics, Introductory Econometrics: A Modern Approach by Jeffrey Wooldridge seem to be a good choice. Currently it is freely available here.
You can also use this free course as a good starting point to R.