Data Analysis in the Social Sciences

Материал из MathINFO
Перейти к навигации Перейти к поиску

Course data

  • Instructor: Ilya V. Schurov.
  • Modules: 2-4.

Software

We use statistical software called R as a main computer tool. It's a free software and you can download it here and install on your computer. We also use RStudio: integrated development environment for R. It's free software either, you can download it here.

Lessons

1. November 14. Data types and descriptive statistics

Types of data in social sciences. Nominal, ordered, interval and ratio scales. Examples. Descriptive statistics: mean, median, mode, variance, standard deviation, quantiles.

See Naked Statistics, Chapter 2, and Statistics, Chapter 4.

The basics of R. R as calculator. Vectors. Calculating basic descriptive statistics with R.

2. November 21. Introduction to statistical thinking

Simple probabilistic models. Population, sample. Model for opinion poll. Random variable. Expected value. The law of large numbers. Central limit theorem.

See Naked Statistics, Chapter 5 and Statistics, Chapters 16, 17 and 18.

3. November 28. Confidence intervals

Histograms as an approximation of distribution: area under the histogram and the probablitiy. Confidence intervals. Dependence of the confidence intervals on the size of sample and confidence level.

See Naked Statistics, Chapter 10 and Statistics, Chapters 19, 20 and 21.

Visualization of confidence intervals

Working with dataframes in R.

4. December 5. Calculating confidence intervals

Standard deviation and standard error of mean. Quantiles of the standard Guassian distribution. The relation between standard error and confidence interval.

See Statistics, chapters 17, 20, 21 and 23, and Naked Statistics, chapters 8 and 10.

Calculation of confidence intervals in R with t.test function. See here.

5. January 16: Statistical hypothesis and t-test

Hypothesis testing framework. Null and alternative hypothesis, p-value. Testing hypothesis on population means with t-test.

See Statistics, Chapter 26 and Naked Statistics, Chapter 9.

Using t.test() function in R to test the hypothesis. See short example and more detailed explanation.

6. January 23: ANOVA

Error types in statistics. Significance level. Multiple comparison problem. ANOVA.

See Introductory Statistics for the Behavioral Sciences, by Welkowitz, Joan, Cohen, Barry H., Lea, R. Brooke, John Wiley & Sons, Incorporated, January 2012, chapter 12 (available under HSE subscription).

Using aov() function in R.

Regression models

Linear regression. Simple and multiple regression.

See Naked Statistics, chapters 11 and 12, Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, chapters 4 -- 6, A Non-Technical Introduction to Regression, by J. Bakija, Williams College, 2013.

Using lm() function in R.

Working with panel data. Models with fixed effects, models with random effects.

See Introduction to Econometrics, by J.Stock and M.Watson, Addison Wesley, 2006, Chapter 10.

Logistic regression. Using glm() function in R.

March 13. In-class work

March 20. In-class work

Data

Homework

  • Homework 1, due date is Sunday, December 4, 23:00. Answers to be submitted here.

Exam

References

As a very informal introduction to the basic notions of mathematical statistics I recommend a book Naked Statistics by Charles Wheelan (available as an e-book both in English and in Russian).

I also like Statistics by David Freedman, Robert Pisani and Roger Purves (4th edition, 1998) as an introductory textbook on mathematical statistics.

To proceed with more sophisticated econometrics, Introductory Econometrics: A Modern Approach by Jeffrey Wooldridge seem to be a good choice. Currently it is freely available here.

You can also use this free course as a good starting point to R.