Exploratory factor analysis

Assumed knowledge

 * Correlation

Purposes of factor analysis
There are two main purposes or applications of factor analysis:

Reduce data to a smaller set of underlying summary variables. For example, psychological questionnaires often aim to measure several psychological constructs, with each construct being measured by responses to several items. Responses to several related items are combined to create a single score for the construct. A measure which involves several related items is generally considered to be more reliable and valid than relying on responses to a single item.
 * 1. Data reduction

Theoretical questions about the underlying structure of psychological phenomena can be explored and empirically tested using factor analysis. For example, is intelligence better understood as a single, general factor, or as consisting of multiple, independent dimensions? Or, how many personality factors are there and what are they?
 * 2. Exploring theoretical structure

History
Factor analysis was initially developed by Charles Spearman in 1904. For more information, see factor analysis history.

Types (methods of extraction)
The researcher will need to choose between two main types of extraction:
 * 1) Principal components (PC): Analyses all variance in the items. This method is usually preferred when the goal is data reduction (i.e., to reduce a set of variables down to a smaller number of factors and to create composite scores for these factors for use in subsequent analysis).
 * 2) Principal axis factoring (PAF): Analyses shared variance amongst the items. This method is usually preferred when the goal is to undertake theoretical exploration of the underlying factor structure.

Rotation
The researcher will need to choose between two main types of factor matrix rotation:
 * 1) Orthogonal (Varimax - in SPSS): Factors are independent (i.e., correlations between factors are less than ~.3)
 * 2) Oblique (Oblimin - in SPSS): Factors are related (i.e., at least some correlations between factors are greater than ~.3). The extent of correlation between factors can be controlled using delta.
 * 3) * Negative values "decrease" factor correlations (towards full orthogonality)
 * 4) * "0" is the default
 * 5) * Positive values (don't go over .8) "permit" higher factor correlations.

If the researcher hypothesises uncorrelated factors, then use orthogonal rotation. If the researchers hypothesises correlated factors, then use oblique rotation.

In practice, researchers will usually try different types of rotation, then decide on the best form of rotation based on the rotation which produces the "cleanest" model (i.e., with lowest cross-loadings).

Determining the number of factors
There is no definitive, simple way to determine the number of factors. The number of factors is a subjective decision made by the researcher. The researcher should be guided by several considerations, including: Mistakes in factor extraction may consist in extracting too few or too many factors. A comprehensive review of the state-of-the-art and a proposal of criteria for choosing the number of factors is presented in [3].
 * 1) Theory: e.g., How many factors were expected? Do the extracted factors make theoretical sense?
 * 2) Eigen values:
 * 3) Kaiser's criterion: How many factors have eigen-values over 1? Note, however, that this cut-off is arbitrary, so is only a general guide and other considerations are also important.
 * 4) Scree-plot: Plots eigen-values. Look for the 'elbow' minus 1 (i.e., where there is a notable drop); the rest is 'scree'. Extract the number of factors that make up the 'cliff' (i.e., which explain most of the variance).
 * 5) Total variance explained: Ideally, try to explain approximately 50 to 75% of the variance using the least number of factors
 * 6) Interpretability: Are all factors interpretable? (especially the last one?) In other words, can you reasonably name and describe each set of items as being indicative of an underlying factor?
 * 7) Alternative models: Try several different models with different numbers of factors before deciding on a final model and number of factors. Depending on the Eigen Values and the screen plot, examine, say, 2, 3, 4, 5, 6 and 7 factor models before deciding.
 * 8) Remove items that don't belong: Having decided on the number of factors, items which don't seem to belong should be removed because this can potentially change and clarify the structure/number of factors. Remove items one at a time and then re-run. After removing all items which don't seem to belong, re-check whether you still have a clear factor structure for the targetted number of factors. It may be that a different number of factors (probably one or two fewer) is now more appropriate. For more information, see criteria for selecting items.
 * 9) Number of items per factor: The more items per factor, the greater the reliability of the factor, but the law of diminishing returns would apply. Nevertheless, a factor could, in theory, be indicated by as little as a single item.
 * 10) Factor correlations - What are the correlations between the factors? If they are too high (e.g., over ~.7), then some of the factors may be too similar (and therefore redundant). Consider merging the two related factors (i.e., run an EFA with one less factor).
 * 11) Check the factor structure across sub-samples -  For example, is the factor structure consistent for males and females? (e.g., in SPSS this can be done via Data - Split file - Compare Groups or Organise Output by Groups - Select a categorical variable to split the analyses by (e.g., Gender) - Paste/Run or OK - Then re-run the EFA syntax)

Criteria for selecting items
In general, aim for a simple factor structure (unless there is a particular reason why a complex structure would be preferable). In a simple factor structure each item has a relatively strong loading on one factor (target loading; e.g., > |.5|) and relatively small loadings on other factors (cross-loadings; e.g., < |.3|).

Consider the following criteria to help decide whether to include or remove each item. Remember that these are rules of thumb only – avoid over-reliance on any single indicator. The overarching goal is to include items which contribute to a meaningful measure of an underlying factor and to remove items that weaken measurement of the underlying factor(s). In making these decisions, consider:
 * 1) Communality - indicates the variance in each item explained by the extracted factors; ideally, above .5 for each item.
 * 2) Primary (target) factor loading - indicates how strongly each item loads on each factor; should generally be above |.5| for each item; preferably above |.6|.
 * 3) Cross-loadings - indicate how strongly each item loads on the other (non-target) factors. There should be a gap of at least ~.2 between the primary target loadings and each of the cross-loadings. Cross-loadings above .3 are worrisome.
 * 4) Meaningful and useful contribution to a factor - read the wording of each item and consider the extent to which each item appears to make a meaningful and useful (non-redundant) contribution to the underlying target factor (i.e., assess its face validity)
 * 5) Reliability - check the internal consistency of the items included for each factor using Cronbach's alpha and check the "Alpha if item removed" option to determine whether removal of any additional items would improve reliability)
 * 6) See also: How do I eliminate items? (lecture notes)

Name and describe the factors
Once the number of factors has been decided and any items which don't belong have been removed, then
 * 1) Give each extracted factor a name
 * 2) Be guided by the items with the highest primary loadings on the factor – what underlying factor do they represent?
 * 3) If unsure, emphasise the top loading items in naming the factor
 * 4) Describe each factor
 * 5) Develop a one sentence definition or description of each factor

Data analysis exercises

 * Data analysis tutorial

Pros and cons

 * Advantages (Wikipedia)
 * Disadvantages (Wikipedia)