Chapter 10 More statistics…

You will no doubt be pleased to learn that the topics covered on this course have not quite exhausted the list of available statistical methods. In this chapter we outline some of the most important further areas of statistics, so that you are at least aware of their existence and titles. For some of them, codes of LSE courses which cover these methods are given in parentheses.

A very large part of advanced statistics is devoted to further types of regression models. The basic idea of them is the same as for multiple linear regression, i.e. modelling expected values of response variables given several explanatory variables. The issues involved in the form and treatment of explanatory variables are usually almost exactly the same as for linear models. Different classes of regression models are needed mainly to accommodate different types of response variables:

  • Models for categorical response variables. These exist for situations where the response variable is dichotomous (binary regression, especially logistic models), has more than two unordered (multinomial logistic models) or ordered (ordinal regression models) categories, or is a count, for example in a contingency table (Poisson regression, loglinear models). Despite the many different titles, all of these models are closely connected (MY452)

  • Models for cases where the response is a length of time to some event, such as a spell of unemployment, interval between births of children or survival of a patient in a medical study. These techniques are known as event history analysis, survival analysis or lifetime data analysis. Despite the different terms, all refer to the same statistical models.

Techniques for the analysis of dependent data, which do not require the assumption of statistically independent observations used by almost all the methods on this course:

  • Time series analysis for one or more long sequence of observations of the same quantity over time. For example, each of the five temperature sequencies in Figure 8.2 is a time series of this kind.

  • Regression models for hierarchical data, where some sets of observations are not independent of each other. There are two main types of such data: longitudinal or panel data which consist of short time series for many units (e.g. answers by respondents in successive waves of a panel survey), and nested or multilevel data where basic units are grouped in natural groups or clusters (e.g. pupils in classes and schools in an educational study). Both of these can be analysed using the same general classes of models, which in turn are generalisations of linear and other regression models used for independent data (ST416 for models for multilevel data and ST442 for models for longitudinal data).

Methods for multivariate data. Roughly speaking, this means data with several variables for comparable quantities treated on an equal footing, so that none of them is obviously a response to the others. For example, results for the ten events in the decathlon data of the week 7 computer class or, more seriously, the responses to a series of related attitude items in a survey are multivariate data of this kind.

  • Various methods of descriptive multivariate analysis for jointly summarising and presenting information on the many variables, e.g. cluster analysis, multidimensional scaling and principal component analysis (MY455 for principal components analysis).

  • Model-based methods for multivariate data. These are typically latent variable models, which also involve variables which can never be directly observed. The simplest latent variable technique is exploratory factor analysis, and others include confirmatory factor analysis, structural equation models, and latent trait and latent class models (MY455).

Some types of research design may also involve particular statistical considerations:

  • Sampling theory for the design of probability samples, e.g.  for surveys (part of MY456, which also covers methodology of surveys in general).

  • Design of experiments for more complex randomized experiments.

Finally, some areas of statistics are concerned with broader and more fundamental aspects of statistical analysis, such as alternative forms of model specification and inference (e.g. nonparametric methods) or the basic ideas of inference itself (e.g. Bayesian statistics). These and the more specific tools further build on the foundations of all statistical methods, which are the subject of probability theory and mathematical statistics. However, you are welcome, if you wish, to leave the details of these fields to professional statisticians, if only to keep them too in employment.

Statistical tables

Explanation of the “Table of standard normal tail probabilities” in Section @ref(s_disttables_Z):

  • The table shows, for values of \(Z\) between 0 and 3.5, the probability that a value from the standard normal distribution is larger than \(Z\) (i.e. the “right-hand” tail probabilities).

    • For example, the probability of values larger than 0.50 is 0.3085.
  • For negative values of \(Z\), the probability of values smaller than \(Z\) (the “left-hand” tail probability) is equal to the right-hand tail probability for the corresponding positive value of \(Z\).

    • For example, the probability of values smaller than \(-0.50\) is also 0.3085.

Table of standard normal tail probabilities

\(z\) Prob.  \(z\) Prob.  \(z\) Prob.  \(z\) Prob.  \(z\) Prob.  \(z\) Prob. 
0.00 0.5000 0.50 0.3085 1.00 0.1587 1.50 0.0668 2.00 0.0228 2.50 0.0062
0.01 0.4960 0.51 0.3050 1.01 0.1562 1.51 0.0655 2.01 0.0222 2.52 0.0059
0.02 0.4920 0.52 0.3015 1.02 0.1539 1.52 0.0643 2.02 0.0217 2.54 0.0055
0.03 0.4880 0.53 0.2981 1.03 0.1515 1.53 0.0630 2.03 0.0212 2.56 0.0052
0.04 0.4840 0.54 0.2946 1.04 0.1492 1.54 0.0618 2.04 0.0207 2.58 0.0049
0.05 0.4801 0.55 0.2912 1.05 0.1469 1.55 0.0606 2.05 0.0202 2.60 0.0047
0.06 0.4761 0.56 0.2877 1.06 0.1446 1.56 0.0594 2.06 0.0197 2.62 0.0044
0.07 0.4721 0.57 0.2843 1.07 0.1423 1.57 0.0582 2.07 0.0192 2.64 0.0041
0.08 0.4681 0.58 0.2810 1.08 0.1401 1.58 0.0571 2.08 0.0188 2.66 0.0039
0.09 0.4641 0.59 0.2776 1.09 0.1379 1.59 0.0559 2.09 0.0183 2.68 0.0037
0.10 0.4602 0.60 0.2743 1.10 0.1357 1.60 0.0548 2.10 0.0179 2.70 0.0035
0.11 0.4562 0.61 0.2709 1.11 0.1335 1.61 0.0537 2.11 0.0174 2.72 0.0033
0.12 0.4522 0.62 0.2676 1.12 0.1314 1.62 0.0526 2.12 0.0170 2.74 0.0031
0.13 0.4483 0.63 0.2643 1.13 0.1292 1.63 0.0516 2.13 0.0166 2.76 0.0029
0.14 0.4443 0.64 0.2611 1.14 0.1271 1.64 0.0505 2.14 0.0162 2.78 0.0027
0.15 0.4404 0.65 0.2578 1.15 0.1251 1.65 0.0495 2.15 0.0158 2.80 0.0026
0.16 0.4364 0.66 0.2546 1.16 0.1230 1.66 0.0485 2.16 0.0154 2.82 0.0024
0.17 0.4325 0.67 0.2514 1.17 0.1210 1.67 0.0475 2.17 0.0150 2.84 0.0023
0.18 0.4286 0.68 0.2483 1.18 0.1190 1.68 0.0465 2.18 0.0146 2.86 0.0021
0.19 0.4247 0.69 0.2451 1.19 0.1170 1.69 0.0455 2.19 0.0143 2.88 0.0020
0.20 0.4207 0.70 0.2420 1.20 0.1151 1.70 0.0446 2.20 0.0139 2.90 0.0019
0.21 0.4168 0.71 0.2389 1.21 0.1131 1.71 0.0436 2.21 0.0136 2.92 0.0018
0.22 0.4129 0.72 0.2358 1.22 0.1112 1.72 0.0427 2.22 0.0132 2.94 0.0016
0.23 0.4090 0.73 0.2327 1.23 0.1093 1.73 0.0418 2.23 0.0129 2.96 0.0015
0.24 0.4052 0.74 0.2296 1.24 0.1075 1.74 0.0409 2.24 0.0125 2.98 0.0014
0.25 0.4013 0.75 0.2266 1.25 0.1056 1.75 0.0401 2.25 0.0122 3.00 0.0013
0.26 0.3974 0.76 0.2236 1.26 0.1038 1.76 0.0392 2.26 0.0119 3.02 0.0013
0.27 0.3936 0.77 0.2206 1.27 0.1020 1.77 0.0384 2.27 0.0116 3.04 0.0012
0.28 0.3897 0.78 0.2177 1.28 0.1003 1.78 0.0375 2.28 0.0113 3.06 0.0011
0.29 0.3859 0.79 0.2148 1.29 0.0985 1.79 0.0367 2.29 0.0110 3.08 0.0010
0.30 0.3821 0.80 0.2119 1.30 0.0968 1.80 0.0359 2.30 0.0107 3.10 0.0010
0.31 0.3783 0.81 0.2090 1.31 0.0951 1.81 0.0351 2.31 0.0104 3.12 0.0009
0.32 0.3745 0.82 0.2061 1.32 0.0934 1.82 0.0344 2.32 0.0102 3.14 0.0008
0.33 0.3707 0.83 0.2033 1.33 0.0918 1.83 0.0336 2.33 0.0099 3.16 0.0008
0.34 0.3669 0.84 0.2005 1.34 0.0901 1.84 0.0329 2.34 0.0096 3.18 0.0007
0.35 0.3632 0.85 0.1977 1.35 0.0885 1.85 0.0322 2.35 0.0094 3.20 0.0007
0.36 0.3594 0.86 0.1949 1.36 0.0869 1.86 0.0314 2.36 0.0091 3.22 0.0006
0.37 0.3557 0.87 0.1922 1.37 0.0853 1.87 0.0307 2.37 0.0089 3.24 0.0006
0.38 0.3520 0.88 0.1894 1.38 0.0838 1.88 0.0301 2.38 0.0087 3.26 0.0006
0.39 0.3483 0.89 0.1867 1.39 0.0823 1.89 0.0294 2.39 0.0084 3.28 0.0005
0.40 0.3446 0.90 0.1841 1.40 0.0808 1.90 0.0287 2.40 0.0082 3.30 0.0005
0.41 0.3409 0.91 0.1814 1.41 0.0793 1.91 0.0281 2.41 0.0080 3.32 0.0005
0.42 0.3372 0.92 0.1788 1.42 0.0778 1.92 0.0274 2.42 0.0078 3.34 0.0004
0.43 0.3336 0.93 0.1762 1.43 0.0764 1.93 0.0268 2.43 0.0075 3.36 0.0004
0.44 0.3300 0.94 0.1736 1.44 0.0749 1.94 0.0262 2.44 0.0073 3.38 0.0004
0.45 0.3264 0.95 0.1711 1.45 0.0735 1.95 0.0256 2.45 0.0071 3.40 0.0003
0.46 0.3228 0.96 0.1685 1.46 0.0721 1.96 0.0250 2.46 0.0069 3.42 0.0003
0.47 0.3192 0.97 0.1660 1.47 0.0708 1.97 0.0244 2.47 0.0068 3.44 0.0003
0.48 0.3156 0.98 0.1635 1.48 0.0694 1.98 0.0239 2.48 0.0066 3.46 0.0003
0.49 0.3121 0.99 0.1611 1.49 0.0681 1.99 0.0233 2.49 0.0064 3.48 0.0003

Table of critical values for t-distributions

df 0.100 0.050 0.025 0.010 0.005 0.001 0.0005
1 3.078 6.314 12.706 31.821 63.657 318.309 636.619
2 1.886 2.920 4.303 6.965 9.925 22.327 31.599
3 1.638 2.353 3.182 4.541 5.841 10.215 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.850
21 1.323 1.721 2.080 2.518 2.831 3.527 3.819
22 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.485 3.768
24 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.450 3.725
26 1.315 1.706 2.056 2.479 2.779 3.435 3.707
27 1.314 1.703 2.052 2.473 2.771 3.421 3.690
28 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 3.307 3.551
60 1.296 1.671 2.000 2.390 2.660 3.232 3.460
120 1.289 1.658 1.980 2.358 2.617 3.160 3.373
\(\infty\) 1.282 1.645 1.960 2.326 2.576 3.090 3.291

Explanation: An example, consider the value 3.078 in the top left corner. This indicates that for a \(t\)-distribution with 1 degree of freedom the probability of values greater than 3.078 is 0.100. The last row shows critical values for the standard normal distribution.

Table of critical values for chi-square distributions

df 0.100 0.050 0.010 0.001
1 2.71 3.84 6.63 10.828
2 4.61 5.99 9.21 13.816
3 6.25 7.81 11.34 16.266
4 7.78 9.49 13.28 18.467
5 9.24 11.07 15.09 20.515
6 10.64 12.59 16.81 22.458
7 12.02 14.07 18.48 24.322
8 13.36 15.51 20.09 26.124
9 14.68 16.92 21.67 27.877
10 15.99 18.31 23.21 29.588
11 17.28 19.68 24.72 31.264
12 18.55 21.03 26.22 32.909
13 19.81 22.36 27.69 34.528
14 21.06 23.68 29.14 36.123
15 22.31 25.00 30.58 37.697
16 23.54 26.30 32.00 39.252
17 24.77 27.59 33.41 40.790
18 25.99 28.87 34.81 42.312
19 27.20 30.14 36.19 43.820
20 28.41 31.41 37.57 45.315
25 34.38 37.65 44.31 52.620
30 40.26 43.77 50.89 59.703
40 51.81 55.76 63.69 73.402
50 63.17 67.50 76.15 86.661
60 74.40 79.08 88.38 99.607
70 85.53 90.53 100.43 112.317
80 96.58 101.88 112.33 124.839
90 107.57 113.15 124.12 137.208
100 118.50 124.34 135.81 149.449

Explanation: For example, the value 2.71 in the top left corner indicates that for a \(\chi^{2}\) distribution with 1 degree of freedom the probability of values greater than 2.71 is 0.100.