- x= ice cream sales; y=violent crime; z= heat waves

- x= ice cream sales; y=drownings; z= heat waves

- x= number of electrical appliances; y=decreased birth rates; z= industrialization

- x= smoking; y=lung cancer; z= tissue damage

- x= age; y=reading ability; z= education

**Correlation versus Regression (Variance)**

- r
_{x,y = correlation between x and x ex: x= SAT scores, y= first year first semester (FYFS) GPA: r=.30 to .40 } - r
^{2}_{x,y = variance in y explained by x ex: x= SAT scores, y= FYFS GPA: r2 = 10 to 20% }

**Purposes/Goals:** (from Hoyt, Imel, Chan, 2008)

- description, to provide a statistical summary of the relationship of the Xs to the Y;
- prediction, to provide an equation that generates predicted scores on some future outcome (Y, e.g., job performance) based on the observed Xs;
- explanation or theory testing: (the direction and magnitude of predicted relationships of Xs to Y can be tested using the actual observed data)

**Assumptions:**

- The DV is continuous and free from outliers (EXPLORE vs. DESCRIPTIVES)
- The IVs (predictors) are continuous and free from intercorrelation/multicolinearity (CORRELATE. Tolerance/VIF)
- intercorrelations among predicters should be low. There is no specific cut-off.. some say .60,
others .70 (Tabachnick & Fidel, 2001),
others .80,
others .90,
others not sure, and some say none of these.
- Considering that .70 suggests about 50% shared variance, we can use this as a cut-off

- If tolerance is less than .20, a problem with multicollinearity is indicated.
- when VIF is high there is high multicollinearity and instability of the b and beta coefficients.
- VIF >= 4 is an arbitrary but common cut-off criterion for deciding when a given independent variable displays "too much" multicollinearity
- values above 4 suggest a multicollinearity problem
- Some researchers use the more lenient cutoff of 5.0 or even 10.0 to signal when multicollinearity is a problem.

- When "Condition Index" in "Collinearity Diagnostics" Table is above 15 (possible problem) or above 30 (serious problem)
- We will use the combination of tolerance and VIF and Collinearity Diagnostics to determine...

- intercorrelations among predicters should be low. There is no specific cut-off.. some say .60,
others .70 (Tabachnick & Fidel, 2001),
others .80,
others .90,
others not sure, and some say none of these.
- The subject-to-predictor ratio is not below 10:1... 15:1 is ideal
- The IVs are free from outliers
- Z-residuals (less than 3)
- Mahalanobis' distance (p>.05; see Stevens p. 108 or table) and Cook's distance (less than 1.0)
- Leverage is not > 3k/n

Example:

Z-residuals (Std Residuals) not <-3 or >3

Mahalanobis distance p <.01

Cook's distance >1

Leverage not > 3k/n = ((3(3)/51 = (9/51) =. 176 (actual value is .059)

In order to control for "shrinkage" (reduction in the predictive power of the regression equations):

- Correlations between predictor variables should be inspected. When pairs of variables have correlations higher than .70, you should consider:
- removing one of the correlated predictors (usually the one with the lowest r with the DV) or
- combining the correlated predictors (average, sum, etc)

- the ratios of subjects-to-predictors in the main regression analyses should at least 15:1.
- and adjusted R
^{2}coefficients should be used as a conservative estimate of explained variance (in all regression analyses). - Default values for the probabilities of "F-to-enter" (.05) and "F-to-remove" (.10) should remain constant for all of the regression analyses.
- Dependent measures should be analyzed for "outliers" by inspecting Z-scores of residuals. No significant effects of outliers, as measures by Z-scores greater than three standard deviations from the mean, should be noted on the dependent variable. Outliers on predictor variables should be identified using Mahalanobis' Distance Formula, and analyzed for their influence on the regression equations using Cook's Distance Formula.