6 Inference
Statistical inference is the process of estimating population parameters from samples. In many ways, making inferences from a sample is the main point of statistical analysis.
6.1 Estimation
6.1.1 Definitions
Estimation involves inferences about the value of population parameters.
Point Estimate : A single-value estimate of a population parameter.
Confidence Interval : An interval which will contain a population parameter to a given probability (typically 95%) under repeated sampling.
6.1.2 Example
Returning to the tree height example, the average tree height calculated from the sample of 1000 trees (17.5m) is a point estimate for the population tree height. The forestry scientists would typically also calculate a 95% confidence interval (CI) from their sample. The confidence interval will depend on the the mean and the standard deviation of their sample. In this case, they determined the 95% CI to be [16.9m, 18.1m]. This means that there is a 95% probability that this confidence interval contains the true population height.
6.2 Hypothesis Testing
Hypothesis testing involves testing a claim (hypothesis) about a population (Library 2023).
6.2.1 Definitions
Null Hypothesis, \(H_0\) : The hypothesis that an observed effect is simply due to the randomness of sampling.
Alternative Hypothesis, \(H_a\) : The hypothesis that an observed effect is a real feature of the population being studied.
Statistical Test : A test that determines if a sample provides enough evidence to reject the null hypothesis.
Test Assumptions : Statistical tests are based on assumptions about the data that need to be satisfied in order for the test to be valid. The assumptions vary between tests.
p-value : The probability of making an error by rejecting the null hypothesis.
Effect Size : A measure of the size of the phenomenon in question.
Statistically Significant : If a statistical test has a p-value less than the specified threshold - typically 5% (0.05) - the result is said to be statistically significant.
Power of a Test : The ability of a test to reject the null hypothesis when an observed effect is real. Power is related to sample size and the size of the effect to be detected.
6.2.2 Example
Based on the observational study, the medical researchers in the eczema treatment experiment developed the hypothesis that once or twice daily treatment with 10mL of the new medication would have a statistically significant effect in reducing the extent of eczema. Stated formally, their hypotheses for the statistical test were:
\(H_0\) : There is no statistically significant difference in the mean area of eczema between the treatment groups.
\(H_a\) : There is a statistically significant difference in the mean area of eczema between the treatment groups.
They estimated the size of the difference in means (effect size) they expected between the treatment groups and used this information to determine the sample size for each group. After checking that their data satisfied the assumptions for the chosen statistical test (One-way ANOVA), the test was applied and returned a significant p-value at the 5% level (p = 0.014). Since there was only a 1.4% probability they would be making an error by rejecting the null hypothesis, they concluded that at least one of the treatments gave a statistically significant reduction in eczema when compared with the non-treatment (control) group. Further statistical tests indicated that there was no statistically significant difference between the once-daily and twice-daily groups (p = 0.73). Based on their experiment, they reported that 10mL once daily of the new medication was an effective treatment for eczema.
6.2.3 Common Statistical Tests
Table 6.1 lists some of the common statistical tests for both normally distributed and non-normally distributed data. Tests are chosen depending on their purpose.
| Purpose | Dependent Variable | Independent Variable | Parametric Test (normal) |
Non-parametric Test (skewed or ordinal) |
|---|---|---|---|---|
| Comparing means of two independent groups | Discrete or continuous | Nominal (two groups) | Independent T-test | Mann-Whitney Test; Wilcoxon Rank Sum |
| Comparing means of two paired (before / after) groups | Discrete or continuous | Ordinal (two groups) | Paired T-test | Wilcoxon Signed Rank Test |
| Comparing Means of 3+ independent groups | Discrete or continuous | Nominal (three or more groups) | One-way ANOVA | Kruskal-Wallis Test |
| Relationship between two continuous variables | Continuous | Continuous | Pearson’s Correlation Coefficient | Spearman’s Correlation Coefficient |
| Expected counts for one qualitative variable | Qualitative | None | Not applicable | Chi-squared Test |
| Relationship between two qualitative variables | Qualitative | Qualitative | Not applicable | Chi-squared Test |