About APLA
Espaņol
Programs & Services
Prevention & Research
HIV Guide
Facts & Statistics
Policy & Activism
Research & Training
Special Events
Publications
APLA Store
 The David Geffen Center

CalendarsGet HelpGivingNewsVolunteer
APLA
 Home > Research & Training > Research & Evaluation > Glossary

Research &
Evaluation Core

Glossary of Statistical Terms

Analysis of Variance (ANOVA): A one-way analysis of variance is a statistical method used for comparing several population means. The ANOVA is used to test the null hypothesis that the population means are all equal. With the ANOVA we can assess whether the observed differences among sample means are statistically significant by examining the F probability (similar to P-value). This is done by comparing the variation among the means of several groups with the variation within groups. If we reject the null hypothesis that all sample means are equal, we need to perform further analysis to draw conclusions about which population means differ from which others. A multiple comparison procedure is used to identify differences between pairs of means. A multiple comparison procedure protects you from calling differences significant when they really aren't. This is accomplished by adjusting the observed significance level for the number of comparisons that you are making. The Bonferroni test for multiple comparisons was used in identifying differences between pairs of means.

A bivariate analysis is a statistical method designed to detect and describe the relationship between two variables. Cross-Tabulation is one technique used for analyzing the relationship between two variables and the Chi-Square statistic indicates the significance of the relationship. It is reported as a P-value. If it is less than .05 than the associated between the two variables is considered to be statistically significant. An example might look like this: Higher incomes is positively associated with higher educational level. The direction of the relationship is not always implied. In other words, it is not always possible to know which came first: higher income or higher education. However, when independent variables such as sex or ethnicity are used, it is assumed that changes in the dependent variable are affected by the independent variable. For instance, if sex is associated with income, then it is assumed that sex affects income rather than income affecting sex (unless you are a transgender saving money for the operation).

Given a significant P-value, we say that two variables have a positive bivariate relationship if they vary in the same direction (a positive change in one variable is associated with a positive change in another variable or a negative change in one variable is associated with a negative change in the other variable) We say that two variables have a negative bivariate relationship if they vary in opposite directions (i.e., a positive change in one variable is associated with a negative change in the other variable or vice-versa). Sometimes independent variables found to be significantly associated with dependent variables are called correlates. Usually, the results of tests found not to be significant (P-value is .05 or greater) are not reported.

Confidence Interval (CI): Research studies typically obtain a sample of a few hundred people and use data from these people to draw conclusions about the entire population. However, if by chance different people had been selected, the results of a statistical analysis would have come out slightly different. The confidence interval shows the range of odds ratios that would be expected if the same study were repeated over and over with different samples of people. The confidence interval includes two numbers in parentheses. The first number is the lower bound. If the study were repeated many times, 95% of the time we would expect the odds ratio to be higher than the lower bound. The second number is the upper bound. If the study were repeated many times, 95% of the time we would expect the odds ratio to be lower than the upper bound. Sometimes both the upper bound and the lower bound are greater than one. In this case, whatever odds ratio we obtain in a particular study, 95% of the time we would expect it to be between the lower and upper bounds, both of which are greater than 1. This is the statistician's criterion for concluding that there is an elevated risk (likelihood) of experiencing the outcome. Sometimes both the upper and lower bounds are less than 1. In this case, whatever the odds ratio we obtain in a particular study, 95% of the time we would expect it to be between the lower and upper bounds, both of which are less than 1. This is the statistician's criterion for concluding that there is a decreased risk (likelihood) of experiencing the outcome. Sometimes, however, the lower bound is less than 1 and the upper bound is greater than 1. In this case, it is possible that we would obtain an odds ratio of 1 (no significant effect) in a subsequent study. Therefore, if the lower bound is less than 1 and the upper bound is greater than 1, we cannot conclude that the risk of the outcome is either elevated or decreased.

Measures of Central Tendency: Numbers that describe what is average or typical of the distribution. Mode: The category or score with the largest frequency (the most). Median: The score that divides the distribution into two equal parts so that half the cases are above it and half below it (if all scores are lined up in order from lowest to highest, e.g.). Mean: The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.

P-value: The p-value indicates the probability that the result obtained in a statistical test is due to chance rather than a true relationship between measures. Small p-values indicate that it is very unlikely that the results were due to chance. Therefore, if the p-value is small, statisticians would be confident that the result obtained is "real." When p is less than .05 (P<.05)- meaning that there is a less than 5% chance that the relationship is due to chance - statisticians usually conclude that the relationship is strong enough that it is probably not just due to chance. A p-value of .05 or less is the commonly used standard to determine that a relationship between variables is significant.

Multivariate, multiple logistic regression: The key point is knowing the difference between a bivariate analysis such as a cross tabulation and a multivariate, multiple regression analysis. A bivariate analysis does not control for the effects of other variables. A multivariate, multiple logistic regression analysis tests for the relationships between variables controlling for the effects of other variables. A significant association in a multivariate, multiple logistic regression analysis means that a particular independent variable is still significantly associated with a dependent variable when the effects of many other independent variables (such as ethnicity, age, income, etc.) are controlled for in a statistical test or model involving one dependent variable and more than one independent variable.

Predictors/Odds ratio (OR): Predictors are independent variables that significantly predict dependent variables in multiple regression analysis. Sometimes, predictors are reported as factors that are significantly associated with or correlated with dependent variable. When we say that an independent variable such as age predicts a dependent variable such as income level, we mean that, regardless of the effects caused by other independent variables such as ethnicity, age can predict income. For instance, if we found a significant relationship between older age and high income in a multiple logistic regression analysis, then we can make the claim that older age predicts higher income. In other words, we can say with more statistical authority (compared to a bivariate analysis) that older people, on average, are more likely to have higher incomes compared to younger people. Such tests also produce the odds ratio statistic.

The odds ratio describes a person's risk (likelihood) of experiencing a certain outcome based on the person's behaviors or membership in a group as compared to another group with different exposures. The odds ratio indicates how many times more likely the outcome is for people in one group relative to another group (reference group). For example, if the odds ratio for lung cancer is 3.5 for smokers compared with nonsmokers, this means that smokers are 3.5 times more likely to get lung cancer. An odds ratio of 1 would indicate that the smokers have the same risk of cancer as nonsmokers, and a negative odds ratio (less than 1) would indicate that smokers are less likely to get cancer than nonsmokers. Odds ratios are generally only reported when they indicate a statistically significant relationship (P<.05).

Significant Associations (Correlates is also used) is a term commonly used to represent a statistically significant (P<.05) relationship between two or more variables.

Variables: A property of people or objects that takes on two or more values. A variable is composed of values that vary. There must be at least two values attributed to a variable. For instance, 'male' and 'female' are the standard values attributed to the variable 'sex', even though sophisticated entities such as AIDS Project Los Angeles frequently are sensitive to additional values such as 'transgender'.

Dependent variables are the ones to be explained by the researcher; they are thought to be dependent on other independent variables.

Independent variables may account for changes in the value of the dependent variable. Independent variables, recognized generally as demographic information such as sex or ethnicity, are those variables that are generally not influenced or caused by other variables. For example, a researcher may be interested to test the hypothesis that one's sex is associated with adherence to one's HIV medications. It is unlikely that taking HIV medications predicts one's sex (though other medications may). It may make sense (based on the literature or a hunch) to hypothesize that men are less likely to be adherent to their HIV medications compared to women. Adherence is the dependent variable. It is a variable because the values (possible response categories on a questionnaire) vary: One may never miss their medications, sometimes miss their medications, or always miss their medications. It is used as a dependent variable because an individual's response may depend upon their sex, ethnicity, age, or a number of other independent variables.

 

Matt G. Mutchler, Ph.D
APLA Research & Evaluation Specialist
323.993.1522 or mmutchler@apla.org

 

Contact APLA Webmaster

 Site MapLinks Contact Us
 Acción Mutua/
 Shared Action


Community-Based
       Research


Peer Support

Prevention with
       Positives


 
Donate Now
Make a secure donation today!

Site Map   •  HIPAA Privacy Policy   •  Links   •  Contact Us  •  Privacy Policy  •  © 2006 AIDS Project Los Angeles