Factor Analysis

(From Ledakis, G. (1999). Factor Analytic Models of the Mattis Dementia Rating Scale in Dementia of the Alzheimer's Type and Vascular Dementia Patients. Doctoral Dissertation, Drexel University.

STATISTICAL ANALYSIS

The data will be analyzed using the Apple Power Macintosh version of the Statistical Packages for the Social Sciences - X (SPSS-X). Descriptive statistics will be calculated for all demographic variables as well as the DRS variables. Specifically, descriptive statistics will be obtained using t-tests of independent means to compare the DAT and VaD groups on the demographic variables of age and education as well as on performance on the DRS and its individual subtests.

Chi-square analysis will be conducted in order to determine if the two groups differ according to the number of males and females. Means, skewness, and kurtosis will be determined for each of the 15 DRS variables. (For the purpose of satisfying the assumptions of factor analysis, the original 36 DRS items were collapsed into 15 variables for these analyses. Specifically, the collapsing of the variables will serve as a means to assure adequate base rates of responses and data that is interval in nature. See Table ? for each of the variable to be entered for factor analysis.). Finally, a correlation matrix for the DRS variables 1 - 15 x DRS variables 1 - 15 will be calculated for each of the DAT and VaD groups.

Primarily, the data set needs to be inspected for meeting the assumptions of factor analysis prior to submitting the data for analysis. Given that the investigation consists of a separate factor analysis for each of the two patient groups (i.e., DAT and VaD patients), it should be assumed that the following discussion refers to both analyses.

Sample Size
Several criteria, differing in their degree of conservativeness have been proposed regarding adequate sample size to be used for factor analysis (Zillmer & Vuz, 1995). One of the most conservative approaches, regarding sample size, has been proposed by Boomsma (1982; as cited in Zillmer & Vuz, 1995) who has recommended a sample size of at least 200 before attempting any factor analysis. A more liberal estimation of appropriate sample size includes the formula N-n-1 greater than or equal to 50, where N is the sample size and n is the number of variables. Finally, Gorsuch (1983) and Bentler (1985; both cited in Zillmer & Vuz, 1995) have suggested an absolute minimum ratio of five participants to one variable, but no less than 100 participants per analysis.

With respect to the current investigation, there are 15 variables that are to be considered. Thus, according to the latter, more conservative rule, a minimum of 100 participants are necessary per analysis to insure validity in the interpretability of the emerging factors. According to the more liberal rule, a minimum of 66 participants are necessary per analysis.

Variable Distribution: Normal Distribution and Base Rates
Factor analysis requires that all data be at least interval in nature; a scale that does not have an absolute zero point, but whose intervals are equal (Zillmer & Vuz, 1995). The DRS scoring per item as well as Total score (see Table 3) is interval in nature.

Next, the normal distribution of the data set needs to be determined via the inspection of means, skewness, and kurtosis of the 15 individual DRS variables. Values of skewness and kurtosis for variables from normal distributions will fluctuate around zero (Zillmer & Vuz, 1995). Furthermore, to better insure normal distribution of the data, the minimal inclusion of dichotomous variables is recommended.

As described earlier, in its standard format, the DRS consists of 36 variables. However, the scoring of several of these variables is dichotomous in nature (i.e, assigned either 0 or 1 point depending on whether the item was performed correctly). To minimize the number of dichotomous variables, several of the DRS items were combined to a final of 15 variables that are to be utilized in the analyses (see Table 3). By combining related variables (e.g., Construction items) to eliminate dichotomous variables, the data approaches normal distribution. The combining of variables not only serves to minimize dichotomous variables, but also serves as a means by which to appropriately influence bases rates.

Base rates are the frequency of occurrence of an observation within the sample (Zillmer & Vuz, 1995). Base rates and the normalcy of a distribution are intimately related. Moderate base rates are synonymous with normally-distributed data. Base rates, as well as the "normalcy" of the distribution, can be determined by examining the frequency distribution and kurtosis of each DRS variable. Although ideally, normal distribution of the data set is expected for valid interpretation of the emerging factors, more recent literature on factor analysis indicates that non-normal distribution of the data does not always significantly affect the data (Zillmer & Vuz, 1995).

Variable Distribution: Linearity
Factor analysis assumes that variables are linearly related. "Because the correlational matrix is the only input procedure for any type of factor analysis, whether exploratory or confirmatory, the cautious researcher should ascertain that the variables to be submitted to factor analysis procedures are linearly related by examining their interrelationship." (Zillmer & Vuz, 1995, p. 266). Variables that are said to be linearly-related to one another produce (i.e., "fall on") a straight line when correlated with each other. Variables that are normally distributed are by definition in a linear relationship with each other. Examination of the initial correlation matrix and, if needed, individual correlational plots of any two DRS variables, will allow one to determine if this criteria is met. However, with respect to linearity, very rarely do any variables in the behavioral sciences adhere to a linear relationship over their full range. Nevertheless, one needs to minimize the inclusion of bivariate or other non-linear relationships in factor analysis. Thus, the inclusion of items that are non-linear will be minimized here.

Independency and Colinearity
To further assure adequate interpretability of the factor structure that emerges from the analysis the assumptions of independence and colinearity must be met. When variables are said to be "independent" there is no identifiable mathematical relationship between them. Thus, if such variables are submitted to factor analysis they will not "hang together" on any one factor. The greater the independence among variables the more difficult it is to extract any common factors, because the variables share little common variance. An extreme example of independence would be a factor structure where the number of factors equals the number of variables.

Colinearity, in contrast, refers to a very high degree of interrelationships among variables. Thus, such variables display a great amount of shared variance, where knowledge of one variable's value will allow for greater accuracy in the prediction of the other variable. Referring to data sets in which there is high colinearity, it becomes increasingly difficult to extract discrete factors when all variables "hang together" and share the same variance (Zillmer & Vuz, 1995). Pragmatically speaking, absolute colinearity would result in a single-factor model. Thus, moderate relationships between variables are most appropriate for factor analysis.

The average degree of interrelationship between variables in a data set can be estimated by examining the coefficients above the diagonal in the correlation matrix and calculating the sum of all correlations in the matrix and dividing that number by the number of total correlations (Zillmer & Vuz, 1995).

Given the problems associated with both independence and colinearity, as they relate to the extraction of interpretable factors in factor analysis, the research needs to minimize both. In an attempt to do so from the start, the current investigator only selected participants whose DRS Total scores were within the range of 135 and 100. Scores greater than 135 not only fall within the normal range and preclude the diagnosis of dementia, via the DRS alone, but such scores would also approach colinearity as neuropsychologically-intact patients approach perfect performances. Similarly, DRS Total scores less than 100, attained by severely-impaired patients would also approach colinearity - resulting in a predictable single-factor model that could be labeled "general dementia".

Examining the Correlation Matrix for its Suitability for Factor Analysis
Factor analysis begins with a matrix of intercorrelations among all variables involved in the analysis. The goal of factor analysis is to determine a smaller number of common factors that help explain these correlations. Thus, the variables must be related to one another, preferably to a moderate degree, in order for the factor analysis to be appropriate (Zillmer & Vuz, 1995). Given the overwhelming amount of correlations to consider using only "visual inspection", factor analysis serves as a statistical means by which to make this comparison and draw conclusions regarding the relatedness of the variables. However, before performing the factor analysis, one needs to first determine the suitability of the correlation matrix for factor analysis. This may be done using a variety of methods including: (1) Bartlett's test of sphericity, (2) the Kaiser-Meyer-Olkin Index, and (3) examination of the number of off-diagonal elements in the anti-image covariance (AIC) matrix > .09. Each of these methods will be discussed in turn, below.

Bartlett's Test of Sphericity
In essence, Bartlett's test of sphericity tests the Null hypothesis, which states that variables in correlation matrix are not related. As the value of the test increases and the associated significance level decreases, the likelihood increases that the Null hypothesis can be rejected and the alternative hypothesis accepted (i.e., the variables that constitute the correlation matrix are related). In contrast, as the value of the test decreases and the associated significance level increases, the likelihood that the Null hypothesis is true increases and, in turn, the alternative hypothesis must be rejected. The use of Bartlett's test of sphericity is recommended for analyses where sample size is relatively small (e.g., n=100).

Kaiser-Meyer-Olkin Index
Another procedure for determining the suitability of the correlational matrix for factor analysis involves the computation of the Kaiser-Meyer-Olkin measure of sampling adequacy (KMO). The KMO is an index for comparing the magnitude of the observed correlation coefficients to the size of the partial correlation coefficients (Zillmer & Vuz, 1995). Partial correlation exists between two variables when the added effects of other variables on the correlation have been eliminated (Zillmer & Vuz, 1995). When the KMO approaches 1.0, the sum of the squared partial correlation coefficients between all pairs is small, compared to the sum of the squared correlation coefficients (Zillmer & Vuz, 1995). A KMO index < .50 indicates the correlational matrix (i.e., data set) is not suitable for factor analysis.

Off-diagonal Elements in the Anti-Image Covariance (AIC) Matrix > .09
The third procedure for determining the suitability of the correlational matrix for factor analysis involves the examination of the number of off-diagonal elements in the anti-image covariance (AIC) matrix greater than .09 (Zillmer & Vuz, 1995). "If the variables share common factors, the anti-image correlation (i.e., the negative of the partial correlation coefficient) between pairs of variables should be small or close to zero, because the linear effects of the other variables have been eliminated" (Zillmer & Vuz, 1995, p. 276). Thus, the count of off-diagonal elements in the anti-image covariance should be less than 30% (Zillmer & Vuz, 1995) in order to consider the data set suitable for factor analysis. If the number of anti-image correlations greater than .09 (in absolute value) is greater than 30%, then factor analysis should be reconsidered because a large number of correlations remain (Zillmer & Vuz, 1995).

Methods of Factor Extraction
Once the correlational matrix has been determined suitable for factor analysis, methods for factor extraction need to be considered. Maximum Likelihood (ML) and Principle Components Factor Analysis (PCFA) are the two methods of factor extraction that currently exist. Each will now be discussed in turn.

Maximum Likelihood (ML) Factor Analysis
Related to the overriding goal of factor analysis to accurately reduce the original set of data into a smaller number of hypothetical variables or constructs, the objective of ML is to exhibit the factor structure that maximizes (in terms of best fit) the likelihood of the observed correlational matrix by finding the underlying population parameters that are expressed in common factors (Zillmer & Vuz, 1995). Factor loadings are calculated to optimize the "goodness of fit" of the set of correlations.

Principle Components Factor Analysis (PCFA)
In contrast to the ML method which only estimates factors through the use of a mathematical model, PCFA transforms the correlation matrix into new, smaller sets of linear combinations of independent (i.e., uncorrelated) principle components (Zillmer & Vuz, 1995). PCFA is a separate technique from the ML method because it partitions the variance of the correlation matrix into new principle components (Zillmer & Vuz, 1995). Specifically, PCFA partitions the total variance of all original variables by finding the first linear combination of variables that accounts for the maximum variance. Next, a second linear combination is extracted that is uncorrelated (i.e., orthogonal) with the first one. The second principle component accounts for the second largest amount of unique variance after the first principle component has been extracted (Zillmer & Vuz, 1995). This process continues until no more principle components can be extracted that would account for a significant amount of original variance. This process of orthogonal extraction of factors (and automatic partitioning of variables) is one of the many advantages PCFA has over the ML method, and why it was selected for the analysis of this investigation's data.

Another advantage of PCFA is its hierarchical ordering of the principle components in terms of percentage of unique variance accounted for. Thus, PCFA provides a better means by which to test the second and third hypotheses of the current investigation. Once again, hypotheses two and three stated: the first, most heavily-loaded factor that will emerge in the respective factor model for each respective patient group will consist of DRS items measuring the most salient deficit of each disorder (i.e., memory degradation in DAT and executive dysfunction in VaD) and will in turn constitute the nature of deficits in other areas of cognition (as evidenced by the shared variance between the variables loading on the same factor).

A third advantage PCFA has over the ML method is related to the degree of high colinearity that many DRS items may have. PCFA is less affected by problems of colinearity than the ML method, and will therefore calculate principle components on data sets when ML factor analysis may not converge on separate common factors (Zillmer & Vuz, 1995).

Determining the Factor Structure By Examining Communalities
Communalities represent the amount of systematic variation for each variable that is accounted for by the set of factors (Zillmer & Vuz, 1995). Communalities can range in value from 0 to 1.0. with 0 indicating that the common factors do not explain any of the variance of that particular variable, and 1.0 indicating that all of the variance of that particular variable is explained by the common factors (Zillmer & Vuz, 1995). Thus, if the majority of the communalities are high (e.g., >.70), a more parsimonious factor structure is likely. Conversely, many low communalities (e.g., <.30), suggests that few variables are associated and thus a suitable factor model may not emerge. In essence, the value of communalities influences how quickly and efficiently convergence (i.e., the "coming together" or "merging" of factors) occurs (Zillmer & Vuz, 1995).

Methods for Determining the Number of Factors to be Retained
Using the aforementioned communalities, the goal is to determine whether a smaller number of factors can account for the covariation among all of the identified variables (Zillmer & Vuz, 1995). Subsequently, the goal is to determine the number of factors that are to be retained in the factor model. There are four methods by which to achieve this latter goal. Each of these will be discussed briefly, in turn.

Eigenvalues > 1 Retained (Kaiser Rule)
Eigenvalues represent the amount of variance accounted for by a collection of associated factors. Thus, the procedure of retaining all eigenvalues > 1 indicates that only groupings of associated variables (i.e., factors) that individually account for variances greater than 1.0 should be retained. Factors with a variance less than 1.0 are no better than a single variable and are therefore not retained within the model. One criticism of the eigenvalue < 1 criteria (also known as the Kaiser rule) is that often additional factors, which lack practical significance (regarding the percent of variance explained), may also be retained. The accuracy of the eigenvalue < 1 criteria is best when the number of variables is small (e.g., 10 to 30) and the communalities are high (e.g., >.70; Zillmer & Vuz, 1995).

The Scree Test
The scree test is another procedure by which to determine the number of factors to be retained in the factor model. This procedure recommends the examination of the plot graph of eigenvalues associated with each factor and identification of the point at which the eigenvalues begin to level off following the steep slope constituted by the greatest eigenvalues (Zillmer & Vuz, 1995). In accordance to the scree test, only the points (i.e., eigenvalues) along the steep slope be retained. The scree test is most effective with large samples with high communalities (Zillmer & Vuz, 1995).

Total Percent of Variance Explained
In behavioral science it is desirable to cumulatively account for at least 70% of the variance of the correlation matrix. The higher the total variance accounted for, the better the factor model will represent the data. However, the researcher should be cautious in his/her attempt to meet this criteria, as to add meaningless or single-loading variables to the factor solution (Zillmer & Vuz, 1995).

Chi-Square Goodness-of-Fit Test
A fourth and final procedure by which to determine the number of factors to retain within the factor model, the chi-square goodness-of-fit test, is only available with the ML method of factor analysis and not with PCFA used here. This index indicates the probability value that the correlation matrix was generated by the proposed model, and thus how well the model represents the data (Zillmer & Vuz, 1995).

Rotational Techniques: Improving the Interpretability of Retained Factors
The goal of factor analysis is "to attain scientific parsimony or economy of description of observable behavior" (Zillmer & Vuz, 1995; p. 285). Thus, the statistical technique of factor analysis should be conceptualized as a means by which to provide a simpler interpretation of the original data set, the measure of the observed behavior (Zillmer & Vuz, 1995). Thus, the choosing of a final factor model should be based on a balance between psychological meaningfulness and statistical simplicity. Therefore, more parsimonious models should be preferred over more complex ones when both models explain the data equally well from a statistical point of reference (Zillmer & Vuz, 1995).

Despite this general consideration, often factors emerge where many variables may have low to moderate size loadings or may load on more than one factor; thus complicating the interpretability of such factors and the factor model as a whole. One way to improve on the interpretability of the factor model is to "rotate" it.

The method of rotation does not change or improve the degree of fit between the data and the factor structure, but rather improves the model's interpretability by way of rearranging the variables and their loadings in the factor structure, thus redistributing the variance for the individual factors within the model (Zillmer & Vuz, 1995).

For the purpose of conceptualizing each of the two types of rotations, one must visualize the respective factors as axes on a plot graph. The greater a variable loads on a particular factor, the closer it plots onto the respective axis. However, often variables will plot between the two axes indicating high loadings on both factors. Orthogonal rotations (the most popular of which are Varimax rotations) maintain rigid 90 degree angles while establishing new reference axes onto which the majority of variables "load". In turn, orthogonal rotations simplify the interpretability of the factor model by minimizing the amount of variables loading on multiple variable and being represented by "off-axis" plot points. The maintenance of 90 degree angles between axes assumes that the factors are unrelated (Zillmer & Vuz, 1995).

Conversely, oblique rotations are guided by the assumption that factors are somehow related, and thus, typically involve axes placed at less than 90 degree angles. Since very few psychological constructs are completely unrelated, oblique rotations are more appropriate (Zillmer & Vuz, 1995). Thus, at least within the behavioral sciences, although the orthogonal rotation provides a simpler and more clear picture of the relationship of the variables to each other, the oblique rotation may fit the data better because most meaningful categories need not be uncorrelated (Zillmer & Vuz, 1995).

Simple Structure: Naming and Interpreting Extracted Factors
For the sake of factor model interpretation, it is desirable that simple structure be attained when the factor rotation is complete. Thurstone (1947; as cited in Bryant & Yarnold, 1997) described simple structure in terms of the following properties. First, each variable should have at least one loading approaching zero on at least one of the factors. In factor models with four or more factors, most of the variables should have loadings that are near zero for most of the factors. Second, for each factor, there should be at least as many variables with loadings that approach zero as there are factors. Third, for every pair of factors, there should be several variables that load on only one factor. In general, variables should load highly on one, and only one factor. This results in what is defined as "simple structure". When simple structure is attained, interpretation of the factors is relatively straightforward (Bryant & Yarnold, 1997).