Survey research and design in psychology/Tutorials/Correlation/Types of correlations - Exercises

Nominal by nominal
What is the relationship between two nominal variables?


 * 1) Univariate frequencies and bar charts
 * 2) Check that there is sufficient data in each category (> ~20)
 * 3) Consider whether recoding may be necessary
 * 4) Crosstabs (also known as contingency tables)
 * 5) Examine contingency table cell frequencies, with marginal totals, and consider also whether it might be useful to provide row and/or column percentages.
 * 6) Clustered bar chart
 * 7) Consider whether to use frequencies or percentages on the Y-axis and whether to include data labels.
 * 8) Inferential test of association: Pearson chi-square test
 * 9) Effect size: Phi (Φ) or Cramer's V (as correlations (or effect sizes)) for analysing the degree of relationship (or dependence) between two nominal variables.
 * 10) Phi (Φ) is used for 2 x 2, 2 x 3 or 3 x 2 tables
 * 11) Cramer's V is is used for >= 3 x 3 tables
 * 12) Chi-square, phi, and Cramer's V are *non-parametric statistics* which do not rely much on assumptions about distribution.
 * 13) But you should check for a minimum expected frequency of 5 per cell. Obtain the minimum expected frequency via Descriptives - Crosstabs - Cells - Expected. If you don't have > 5 minimum expected frequency per cell, you should recode the data into fewer categories.
 * 14) * The sign (+ or -) of Phi (Φ) and Cramer's V needs interpretation because there is no pre-set meaningful order to the way the variables are coded (the codes are arbitrary).

What is the relationship between Gender and Belief in God?
Data file: qfsall_3.sav


 * 1) What is the relationship between Gender and Belief in God? (nominal by nominal)
 * 2) Check univariate frequencies and bar charts.
 * 3) Analyze → Descriptives → Frequencies (place Gender and God into the frequencies box). On the right-hand side click on Charts and tick bar charts. Optional: Choose frequencies or percentages.
 * 4) You should notice in the output that Belief in God has an extra category (3s) which needs to be removed (recode as system missing), plus for this exercise we will just focus on the "believers" (Yes/No), so 2s (Sort of) will be also be recoded as missing.
 * 5) Recode God into a new variable GodR for which the mis-entered data (the 3s) and the "sort of" responses (the 2s) are missing. Switch the decimal places to 0, and add variable and value labels to GodR, plus drag it next to the original God variable. Check univariate frequencies and bar chart for the recoded variable.
 * 6) Transform → Recode into Different Variables. Click across God and in the name box rename it "GodR", you can also give it a label (e.g., God (recoded)). You then need to click the Change box to enter the new variable name. Then click "Old and New Values" to recode - you want to keep 0 (No) and 1 (Yes) the same and just get rid of the additional values. This can be done in a number of ways (see example in box below)
 * 7) Click Continue and Paste. Then go to the syntax screen and run the Recode syntax. Check at the right-hand end of the data file in Data View that a new variable, GodR, has appeared.
 * 8) Go to Variable View and add Value Labels ((0 = Yes, 1 = No) or copy the value labels from God). Change the decimal points to 0.
 * 9) Go into Analyze --> Descriptives ---> Frequencies and add in the new GodR variable (God (recoded)) variable to check the frequencies of the recoded variable.
 * 10) Crosstabs (Gender by God). To test whether there is a dependent relationship between Gender and Belief in God, calculate the chi-square as a test statistic and Phi or Cramer's V for the correlation statistic. To do this go to:
 * 11) Analyse → Descriptives → Crosstabs and enter Gender and GodR. Also check on the box to get a clustered bar chart.
 * 12) Statistics (Chi square - Phi and Cramer's V): In the Statistics box, select Chi-Square and Phi/Cramer's V.
 * 13) Cells (Expected - Row and Column %s): To simply get the observed cell counts, there is no need to change the options in the Cells box. However, in the Cells box you can also choose to get Expected scores and/or Row or Column percentages. Adding both %s into one analysis can make the output confusing to interpret; so try adding row %s to one analysis and column %s to another analysis in order to break it down - see which is most useful/interpretable.
 * 14) Continue and Paste, and then run the test from the syntax file using the big green play button.
 * 15) The result should be: χ2 (1, 127) = .10, p = .76; Φ = -.03. There is a very, very small relationship such that females were slightly more likely to believe in God than males, however this is not a statistically significant relationship. In other words, the observed relationship in the data could easily have occurred by chance.
 * 16) Tip: Clustered bar graphs can be generated directly (without using Cross-tabs) via Graphs → Legacy dialogs → Bar → Clustered. Try putting Gender on the Category Axis and Bars Clustered by GodR, with % on the Y-axis; but it could also be drawn the other way with GodR on the Category Axis and Bars Clustered by Gender. Which chart is easiest to interpret?

Old > New 0 --> 0 1---> 1 2---> System Missing 3---> System Missing
 * Recoding


 * Gender by Belief in God - Cross-tabs, chi-square, phi and clustered bar graph.

FREQUENCIES VARIABLES=Gender God /BARCHART PERCENT /ORDER=ANALYSIS.

RECODE God (0=0) (1=1) (2=SYSMIS) (3=SYSMIS) INTO GodR. VARIABLE LABELS GodR 'God (recoded)'. EXECUTE.

FREQUENCIES VARIABLES=GodR /BARCHART PERCENT /ORDER=ANALYSIS.

CROSSTABS /TABLES=Gender BY GodR /FORMAT=AVALUE TABLES /STATISTICS=CHISQ PHI /CELLS=COUNT /COUNT ROUND CELL.

GRAPH /BAR(GROUPED)=PCT BY Gender BY GodR.

What is the relationship between Smoking and Snoring?
Data file: qfsall_3.sav


 * 1) What is the relationship between Smoking and Snoring?
 * 2) Univariate frequencies, bar chart (Snoring and Smoking): Analyse → Descriptives → Frequencies → enter Smoking and Snoring and request bar charts.
 * 3) The bar chart for Smoking shows a very positively skewed distribution and thus it would be better if it were recoded into a dichotomous variable which indicates whether each respondent is a non-smoker or a smoker.
 * 4) To recode Smoking: Transform → Recode into Different Variables. Enter Smoking and rename it as SmokingR. Change the label (e.g., Smoking status) and then click Change. Then click Old and New Variables – remove any rules that were placed there from the previous exercise.
 * 5) Add 0 as old value and 0 as new value (i.e., non-smokers will keep the same coding).
 * 6) For the second rule, select Range Through to HIGHEST and enter .01 (this will include anyone who reports smoking but smokes less than 1 cigarette per day) and make the new value 1. This means that all non-zero values will become 1 (Smoker).
 * 7) Click Continue, Paste, and run from the syntax file.
 * 8) Add Value labels via Variable View to the new SmokingR variable (e.g., 0 = Non Smoker, 1 = Smoker) and change the number of decimal places to 0. Run and check frequencies and bar chart for the new variable (SmokingR).
 * 9) Crosstabs, with Statistics (Chi-square and Phi/Cramers V) and Clustered Bar Chart.  To test the significance, use Chi-square. Analyse → Descriptives → Crosstabs. The chosen statistics from last time should have been saved so you can just click Paste and Run.
 * 10) Examination of the clustered bar chart(s) and contingency table (with cell row and/or column percentages) should reveal that: Smokers are almost twice as likely to report Snoring than Non-smokers, with χ2 (1, 188) = 8.07, p = .004; Φ = -.21.
 * 11) However, be careful with the interpretation of causality - e.g., the observed relationship could be because:
 * 12) Smoking increases the likelihood of snoring (e.g., by restricting airway)
 * 13) Snoring may cause smoking (e.g., snoring might make people tired and then more likely to smoke)
 * 14) The relationship may be bidirectional - i.e., they cause each other
 * 15) The relationship may be due to a third variable (e.g., age or weight)


 * Smoking and Snoring - Cross-tabs, chi-square, phi and clustered bar graph.

GRAPH /BAR(SIMPLE)=PCT BY Snoring.

GRAPH /HISTOGRAM(NORMAL)=Smoking.

FREQUENCIES VARIABLES=Snoring Smoking /ORDER=ANALYSIS.

RECODE Smoking (0=0) (.01 thru Highest=1) INTO SmokingR. EXECUTE.

VARIABLE LABELS Smoking 'Smoking (recoded)'.

GRAPH /BAR(SIMPLE)=PCT BY SmokingR.

FREQUENCIES VARIABLES=SmokingR /ORDER=ANALYSIS.

CROSSTABS /TABLES=Snoring BY SmokingR /FORMAT=AVALUE TABLES /STATISTICS=CHISQ PHI /CELLS=COUNT /COUNT ROUND CELL /BARCHART.

See Extra exercises

Dichotomous by interval/ratio
What is the relationship between a dichotomous variable and an interval/ratio variable?


 * 1) The point biserial correlation (rpb) is for analysing the relationship between a dichotomous and an interval/ratio variable.
 * 2) Possible graphical depictions of such a relationship include: scatterplot (with point bins and line of best fit), bar chart (showing the mean interval/ratio value for each dichotomous value), or error-bar chart (as per the bar chart, but with confidence also indicated)
 * 3) Compute the product-moment correlation (r).
 * 4) Interpret the r taking into account the direction of coding for the dichotomous scale.
 * 5) Note that the significance test for a rpb is equivalent to an independent samples t-test.

What is the relationship between Gender and Australianness?
Data file: qfsall_3.sav


 * 1) What is the relationship between Gender (dichotomous) and Australianness (interval)?
 * 2) Examine univariate frequencies and bar graphs. Analyse → Descriptives → Frequencies - Enter Gender and Australianness.
 * 3) Three different types of graphs could be drawn to depict this bivariate relationship:
 * 4) Scatterplot - Gender and Australianness: Because there is interval data we should look at a scatterplot. To create go to Graphs → Legacy Dialogs → Scatter/Dot → Simple scatter → Define. Then add the predictor variable Gender to the x axis, and the dependent variable, Australianness to the y axis, click Paste and then run the syntax.
 * 5) To edit the chart:
 * 6) Double-click chart editor to go into chart editor
 * 7) Double click on a data point and change to point bins
 * 8) Add line of best fit ("add fit line to total")
 * 9) Alternative ways of graphing this data include:
 * 10) Bar graph - with mean of Australianness on the Y-axis and Gender on the X-axis (category axis)
 * 11) Error-bar graph - with Australianness as the (dependent) variable and Gender as the X-axis (category axis)
 * 12) To obtain the correlation and its significance, go to Analyze → Correlate → Bivariate - Add Gender and Australianness (order doesn't matter) - Paste and Run
 * 13) The result should be rpb = -.04, p = .62, N = 189. This indicates that there is a very small, slightly negative and non-significant linear relationship. Males (coded as 0) in the sample perceive themselves as very slightly more Australian then females (coded as 1), but this result is likely to have come about by chance.

FREQUENCIES VARIABLES=Australianness /BARCHART FREQ /ORDER=ANALYSIS.

GRAPH /SCATTERPLOT(BIVAR)=Gender WITH Australianness /MISSING=LISTWISE.

CORRELATIONS /VARIABLES=Australianness Gender /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.

See Extra exercises

Interval/ratio by Interval/ratio
What is the relationship between two interval/ratio variables?


 * Statistical tasks/techniques
 * 1) Scatterplot
 * 2) Scatterplot with markers to consider the relationship for sub-groups
 * 3) Product-moment correlation or Pearson's correlation for analysing the linear relationship between two continuous (or near continuous e.g., interval > \~5 categories data) variables

What is the relationship between Gender Role and Australianness?
Data file: qfsall_3.sav


 * 1) What is the relationship between Gender Role and Australianness? (interval by interval)
 * 2) Scatterplot for Australianness and Gender_role
 * 3) Graphs → Legacy Dialogs → Scatter/Dot → Simple scatter → Define. Add the predictor variable (Gender_role) to the x axis, and the dependent variable (Australianness) to the y axis and paste and run.
 * 4) Double-click the chart to enter the chart editor.
 * 5) Add line of best fit ("Add fit line to total").
 * 6) Obtain the product-moment correlation and its statistical significance - Analyse → Correlate → Bivariate - enter Australianness and Gender_role (order is arbitrary)
 * 7) r is .12, p = .100, N = 185
 * 8) This is larger than the rpb between Gender and Australianness, but is still very small (r2 is .01 (i.e., 1% of shared variance)) and non-significant.
 * 9) There may be heterogenous samples. So, to further consider the role of Gender:
 * 10) Create a new scatterplot, adding Gender to the "Set markers by" box. Paste and Run. The scatterplot will then use green for predominantly female points and blue for predominantly male points. Double-click the chart and add the overall line of best fit. Then add lines of best fit for subgroups (button next to “add fit line at total”) – this should show that females who score high on femininity also score high on Australianness and males who score high on masculinity also score high on Australianness. Note: There is a bug in SPSS v.23. To fit linear lines of best fit at the sub-group level: (a) add a total line of best fit, then undo (so it disappears) (CTRL-Z), then (b) add fit lines for sub-groups.
 * 11) To test for significance of correlations separately for males and females, split the file by gender, then re-run the correlation command.
 * 12)  Data → Split File → Compare Groups by Gender. Then click OK or Paste and Run.
 * 13) Then re-run the correlations either from existing syntax or Analyse → Correlation → Bivariate: Australianness and Gender_role. The output will provide correlations for males and females separately.
 * 14) Interesting results! The correlation for males is moderately positive (.50; 25% shared variance - see coefficient of determination) and the correlation for females is small and negative (-.23; 5% shared variance). Both these correlations are statistically significant (p < .05) even though the overall correlation (.12) was non-significant. Therefore:
 * 15) gender role is related to Australianness, but in different directions for males and females (being more masculine for males is associated with higher Australianness, whereas being more feminine for females is associated with higher Australianness), and
 * 16) Australian men’s national identity is more strongly tied (25% shared variance) to their gender identity than it is for females (5% of shared variance).


 * Femininity-Masculinity and Australianess - Scatterplot and Point bi-serial correlation.

GRAPH /BAR(SIMPLE)=PCT BY Gender_role.

FREQUENCIES VARIABLES=Australianness Gender_role /BARCHART PERCENT /ORDER=ANALYSIS.

GRAPH /SCATTERPLOT(BIVAR)=Gender_role WITH Australianness /MISSING=LISTWISE.

CORRELATIONS /VARIABLES=Australianness Gender_role /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.

GRAPH /SCATTERPLOT(BIVAR)=Gender_role WITH Australianness BY Gender /MISSING=LISTWISE.

SORT CASES BY Gender. SPLIT FILE LAYERED BY Gender.

CORRELATIONS /VARIABLES=Australianness Gender_role /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.