Research
&
Evaluation
Core
Glossary of Statistical
Terms
Analysis
of Variance (ANOVA): A one-way analysis of variance is
a statistical method used for comparing several population means.
The ANOVA is used to test the null hypothesis that the population
means are all equal. With the ANOVA we can assess whether the
observed differences among sample means are statistically significant
by examining the F probability (similar to P-value). This is
done by comparing the variation among the means of several groups
with the variation within groups. If we reject the null hypothesis
that all sample means are equal, we need to perform further
analysis to draw conclusions about which population means differ
from which others. A multiple comparison procedure is used to
identify differences between pairs of means. A multiple comparison
procedure protects you from calling differences significant
when they really aren't. This is accomplished by adjusting the
observed significance level for the number of comparisons that
you are making. The Bonferroni test for multiple comparisons
was used in identifying differences between pairs of means.
A bivariate
analysis is a statistical method designed to detect and
describe the relationship between two variables. Cross-Tabulation
is one technique used for analyzing the relationship between
two variables and the Chi-Square statistic indicates the significance
of the relationship. It is reported as a P-value. If it is less
than .05 than the associated between the two variables is considered
to be statistically significant. An example might look like
this: Higher incomes is positively associated with higher educational
level. The direction of the relationship is not always implied.
In other words, it is not always possible to know which came
first: higher income or higher education. However, when independent
variables such as sex or ethnicity are used, it is assumed that
changes in the dependent variable are affected by the independent
variable. For instance, if sex is associated with income, then
it is assumed that sex affects income rather than income affecting
sex (unless you are a transgender saving money for the operation).
Given a significant P-value, we say
that two variables have a positive bivariate relationship if they
vary in the same direction (a positive change in one variable
is associated with a positive change in another variable or a
negative change in one variable is associated with a negative
change in the other variable) We say that two variables have a
negative bivariate relationship if they vary in opposite directions
(i.e., a positive change in one variable is associated with a
negative change in the other variable or vice-versa). Sometimes
independent variables found to be significantly associated with
dependent variables are called correlates. Usually, the results
of tests found not to be significant (P-value is .05 or greater)
are not reported.
Confidence
Interval (CI): Research studies typically obtain a sample
of a few hundred people and use data from these people to draw
conclusions about the entire population. However, if by chance
different people had been selected, the results of a statistical
analysis would have come out slightly different. The confidence
interval shows the range of odds ratios that would be expected
if the same study were repeated over and over with different
samples of people. The confidence interval includes two numbers
in parentheses. The first number is the lower bound. If the
study were repeated many times, 95% of the time we would expect
the odds ratio to be higher than the lower bound. The second
number is the upper bound. If the study were repeated many times,
95% of the time we would expect the odds ratio to be lower than
the upper bound. Sometimes both the upper bound and the lower
bound are greater than one. In this case, whatever odds ratio
we obtain in a particular study, 95% of the time we would expect
it to be between the lower and upper bounds, both of which are
greater than 1. This is the statistician's criterion for concluding
that there is an elevated risk (likelihood) of experiencing
the outcome. Sometimes both the upper and lower bounds are less
than 1. In this case, whatever the odds ratio we obtain in a
particular study, 95% of the time we would expect it to be between
the lower and upper bounds, both of which are less than 1. This
is the statistician's criterion for concluding that there is
a decreased risk (likelihood) of experiencing the outcome. Sometimes,
however, the lower bound is less than 1 and the upper bound
is greater than 1. In this case, it is possible that we would
obtain an odds ratio of 1 (no significant effect) in a subsequent
study. Therefore, if the lower bound is less than 1 and the
upper bound is greater than 1, we cannot conclude that the risk
of the outcome is either elevated or decreased.
Measures
of Central Tendency: Numbers that describe what is average or typical
of the distribution. Mode: The
category or score with the largest frequency
(the most). Median: The score
that divides the distribution into two
equal parts so that half the cases are
above it and half below it (if all scores
are lined up in order from lowest to
highest, e.g.). Mean: The arithmetic
average obtained by adding up all the
scores and dividing by the total number
of scores.
P-value: The p-value indicates
the probability that the result obtained
in a statistical test is due to chance
rather than a true relationship between
measures. Small p-values indicate that
it is very unlikely that the results
were due to chance. Therefore, if the
p-value is small, statisticians would
be confident that the result obtained
is "real." When p is less than .05 (P<.05)-
meaning that there is a less than 5%
chance that the relationship is due
to chance - statisticians usually conclude
that the relationship is strong enough
that it is probably not just due to
chance. A p-value of .05 or less is
the commonly used standard to determine
that a relationship between variables
is significant.
Multivariate,
multiple logistic regression: The key point is knowing
the difference between a bivariate
analysis such as a cross tabulation
and a multivariate, multiple regression
analysis. A bivariate analysis does
not control for the effects of other
variables. A multivariate, multiple
logistic regression analysis tests
for the relationships between variables
controlling for the effects of other
variables. A significant association
in a multivariate, multiple logistic
regression analysis means that a particular
independent variable is still significantly
associated with a dependent variable
when the effects of many other independent
variables (such as ethnicity, age,
income, etc.) are controlled for in
a statistical test or model involving
one dependent variable and more than
one independent variable.
Predictors/Odds
ratio (OR): Predictors
are independent variables that significantly
predict dependent variables in multiple
regression analysis. Sometimes, predictors
are reported as factors that are significantly
associated with or correlated with dependent
variable. When we say that an independent
variable such as age predicts a dependent
variable such as income level, we mean
that, regardless of the effects caused
by other independent variables such
as ethnicity, age can predict income.
For instance, if we found a significant
relationship between older age and high
income in a multiple logistic regression
analysis, then we can make the claim
that older age predicts higher income.
In other words, we can say with more
statistical authority (compared to a
bivariate analysis) that older people,
on average, are more likely to have
higher incomes compared to younger people.
Such tests also produce the odds ratio
statistic.
The odds ratio describes a person's
risk (likelihood) of experiencing a
certain outcome based on the person's
behaviors or membership in a group as
compared to another group with different
exposures. The odds ratio indicates
how many times more likely the outcome
is for people in one group relative
to another group (reference group).
For example, if the odds ratio for lung
cancer is 3.5 for smokers compared with
nonsmokers, this means that smokers
are 3.5 times more likely to get lung
cancer. An odds ratio of 1 would indicate
that the smokers have the same risk
of cancer as nonsmokers, and a negative
odds ratio (less than 1) would indicate
that smokers are less likely to get
cancer than nonsmokers. Odds ratios
are generally only reported when they
indicate a statistically significant
relationship (P<.05).
Significant
Associations (Correlates is also used) is a term commonly
used to represent a statistically significant (P<.05) relationship
between two or more variables.
Variables: A
property of people or objects that takes on two or more values.
A variable is composed of values that vary. There must be at least
two values attributed to a variable. For instance, 'male' and
'female' are the standard values attributed to the variable 'sex',
even though sophisticated entities such as AIDS Project Los Angeles
frequently are sensitive to additional values such as 'transgender'.
Dependent
variables are the ones to be explained by the researcher;
they are thought to be dependent on other independent variables.
Independent
variables may account for changes in the value of the
dependent variable. Independent variables, recognized generally
as demographic information such as sex or ethnicity, are those
variables that are generally not influenced or caused by other
variables. For example, a researcher may be interested to test
the hypothesis that one's sex is associated with adherence to
one's HIV medications. It is unlikely that taking HIV medications
predicts one's sex (though other medications may). It may make
sense (based on the literature or a hunch) to hypothesize that
men are less likely to be adherent to their HIV medications
compared to women. Adherence is the dependent variable. It is
a variable because the values (possible response categories
on a questionnaire) vary: One may never miss their medications,
sometimes miss their medications, or always miss their medications.
It is used as a dependent variable because an individual's response
may depend upon their sex, ethnicity, age, or a number of other
independent variables.
Matt
G. Mutchler, Ph.D
APLA Research & Evaluation Specialist
323.993.1522 or mmutchler@apla.org
Contact
APLA Webmaster |