MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149.

False similarities plague the findings where differences in fact exist. This masking of differences is one of the reasons that studies with small sample sizes, lacking the proper preliminary power and sample size analysis, are treated with suspicion by statistical cognoscenti. If medical researchers are engaged in clinical trials of a drug, insufficient sample size that undermines assessment is criminal.

Moreover, this is the principal reason for which persons applying for large grants usually have to perform power and sample size analyzes for the statistical tests they are planning and show that they can detect small or medium effects with sufficient power (usually of 0.8). With a power of 0.8, the user has a 20% chance of making a type II error -- obtaining a false negative result (in other words, the failure to detect a real difference when it exists) from insufficient sample size. For these reasons, it is important for researchers to understand the concept of statistical power and to know how and when to apply it.

A Type I error, rejection of a null hypothesis where there should have been no rejection, is in inverse proportion to the power. Too large a sample size means the waste of valuable resources in the process of the data collection. The key questions are how much sample size is enough, how much safety margin is needed, and how much is too much. In short, what is the optimal sample size?

Similarly, Sample Power calculates power and sample size for a variety of basic t, proportions, and crosstabulation tests. For a number of t-tests, it tests whether the t=0 or t=specific value. It does this for one sample t-tests, two sample t-tests with the same variance, two sample t-tests with different variances, and paired t-tests. The program computes power for a number of tests of proportions as well. The program will test whether a proportion equals 0.5 or a specific value. It will test 2x2 independent samples chi-square or Fisher's exact tests, as will Jerry Hintze's Power Analysis and Sample Size Program (PASS 6.0), distributed by NCSS. Paired proportions power is tested for McNemar's test. The sign test power can be calculated as well. A table comes up for each of the tests allowing the user to input specific sample sizes for different cells and then to request a graph of power as a function of number of cases for the given effect size, alpha level, and number of tails for the statistical test under consideration.

For correlation and regression analysis, Sample Power assesses power for different sample sizes. For correlation analysis, the power analysis program will test the power for specific sample sizes, alpha levels, one- or two-tailed tests for a correlation that is equal to zero, equal to a specific value, or equal to each other. For regression analysis, StatPower has almost the least capability. It can handle multiple correlation with and without partialling. PASS 6.0 is slightly better and can handle one or two set regression analysis, with the first set consisting of covariates. Sample Power, which has even greater flexibility, will handle one set of independent variables; a model with a set of covariates and a set of independent predictors; a model with a set of covariates, a set of independent variables, plus a set of interactions; a polynomial regression; and a model with covariates and dummy variables. The user merely has to indicate the number of independent variables, the r square of the respective set of independent variables, and the sample size, after which the program will compute the power for each set.

For ANOVA designs, Sample Power and PASS 6.0 are very good for basic cross-sectional analysis. These two programs as well as StatPower, the Statistical Design Analysis System, developed by James L. Bavry, Ph.D and distributed by Scientific Software, perform the one-, two-, and three-way fixed effects anova. Given the number of levels of each factor and the effect size tested, Sample Power will perform power and sample sizes for one-, two-, and three-way fixed effects ANOVA and ANCOVA models. PASS 6.0 can perform the ancovas as multiple regressions, but does not have procedures dedicated to them. It can perform power and sample size analysis on randomized block anova and a repeated measures design with one between and one within subjects effect. The advanced psychology researcher might prefer StatPower which performs power and sample size analyzes for one- and two-way fixed effects, one- and two-way random effects, and general fixed effects designs. What is more, StatPower handles one and two way repeated univariate and multivariate repeated measures (mixed model) designs as well. Sometimes, the biostatistician will have special need for PASS 6.0, which excels in its power and sample size analysis for logistic regression and in its power analysis for the log-rank test performed with Life-Tables survival analysis. It also provides for plotting of power for different proportions resulting from matched case/control studies or from bioequivalence of proportions resulting from clinical research.

As for distributional tests, Statpower by far surpasses the other two in variety and range. PASS 6.0 cannot handle any wide variety of distributions at all, while Sample Power can handle a few basic distributional tests for t, F, and Chi-squares. By comparison, StatPower yields probabilities for whatever level of cumulative, inverse, and noncentral z, t, Chi-square, and F distributions may be found. Moreover, it produces probabilities for binomial and beta cumulative and inverse distributions, as well as logistic and Poisson cumulative distributions.

In directions for future research and development, there will be a growing need for analysis of power and sample size of time series data with time series tests and sparse data with Exact tests. While Exact tests are robust to errors of the alpha level, they need a power and sample size analysis to indicate the magnitude of the problem of the Type II error. There needs to be more theoretical development and statistical package implementation in both of these areas. It is hoped that future research and development in this area will fill these gaps of knowledge and capability. For assistance with these matters please e-mail me at robert.yaffee@nyu.edu or phone me at the ACF Statistics and Social Science Group, 998-3402.

* Further Reading: Researchers interested in pursuing the
study of power analysis may refer to R. Goldstein, "Power and sample size
via MS/PC-DOS computers" in The American Statistician, 43:253-260, 1989.
For a comprehensive list of power analysis programs refer to Len Thomas and
Charles Kreb, "A Review of Statistical Power Analysis Software," in the
forthcoming Bulletin of the Ecological Society of America, 78 (2). A copy of
this article is available at http://www.interchg.ubc.ca/cacb/power.
*

http://home.clara.net/sisa/index.htm http://trochim.human.cornell.edu/OJtrial/ojhome.htm