Rodger's Method

Rodger's Method

Anova
Statistical analysis, especially in the sciences, social sciences and related disciplines, usually involves the examination of multiple categories of information (say, J such sets). These are often random samples of observations; or random assignments of patients, people, pigeons, plants, field plots, etc., to 'treatment' conditions - the 'treatments' being drugs, optical illusions, learning conditions, fertilizers, time of planting, etc. The object of the analysis of such sets of data is to determine whether (and where) they differ from one another by more than 'random error' can explain away. Typically, each of the J sets is characterized by a single 'statistic.' That is often the mean or average of the observations in the set, but sometimes by some other 'statistic' such as the proportion of successes in the set. Whether such 'sample' means (mj) or proportions (pj) differ from one another by more than 'random error' - in the light of the amount of variation among the observations within sets (or samples) - is measured in the procedure known as Analysis of Variance (or Anova), or various forms of that procedure, or analogues of the method such as analysis of contingency tables using the Chi-square approximation to the multinomial distribution.

Although the ideas incorporated in Anova have a quite long history, a paper by Sir Ronald Aylmer Fisher (1918) started the modern ball rolling. If the true population mean of set j was μj, then the hypothesis of interest seemed to be:

H0: μ1 = μ2 =. . . = μJ {1}

This null hypothesis says that the sets (at least as characterized by their means) do not differ. It seemed natural to accept that hypothesis if the variation among the sample mj was no more than was reasonably likely in the light of the observed variation within samples.

Type 1 Error Rate
At this point, a controversial issue has reared its head. The Fisherian doctrine, as it developed over the next 40 years, refused to accept null hypotheses when they could not be rejected. After all, they might really be false, but by an amount so small that only huge samples would be large enough to reject them! Some fifteen years later, Jerzy Neyman and Egon Sharpe Pearson (1933) formalized the criteria for accepting and rejecting null hypotheses. An excellent discussion of some of these matters can be found at the mathnstats website (via the external links section below).

It had been suggested that if the probability of the observed data variation (or more) was 0.05 or less (given H0 were true), that might be grounds for suspecting that the null hypothesis at {1} was not true. Neyman and Pearson formalized and extended that thinking significantly. One should set up the criterion for 'deciding' whether H0 was true, before the data were collected (say at a probability α, which might be 0.05 or some similarly small probability). If the observed probability of the data variation turned out to be α or less, then one rejects H0. That might, of course, be a mistake (called a type 1 error). The probability of such an error is α when H0 is really true. 'Deciding' that H0 is true is not 'chipped in stone' - for example, such decisions can be revised by later evidence.

Power
Neyman-Pearson also pointed out that there is another, important side to this matter. H0 may indeed be false; so we should try to arrange the size of our statistical investigation (by choice of sample size N) to yield a decent probability (say, β) of detecting that falsity. The procedure is to state how far from equal the μj might be (or how small the true variation needs to be to lead us to discount it at this stage of investigation). In Anova, that involves the calculation of a noncentrality parameter, then its use in Fisher's Noncentral Variance Ratio Distribution. It is rather poetic that although he did not accept the Neyman-Pearson methodology, Fisher's (1928) distribution plays a crucial part in the method for setting 'power' β.

Rodger's Approach
Although the classical H0 plays an important part in the theory of Anova, that part is more theoretical than practical. This is reflected in the fact that there are at least three formulae for the variance-ratio (Fm), from Anova, that have been used to decide whether to accept or reject H0. The first is the obvious variance-ratio form, but the second shows that it is equivalent to an evaluation of whether a null contrast should be accepted or rejected, and the third shows that Fm is equivalent to the simultaneous evaluation of any (J-1) linearly independent contrasts. A constant sample size N is used throughout here to keep the formulae simple. Unequal Nj can easily be handled, though in real applications unequal Nj raise the risk of misleading results when the true variances (σ2j) are unequal.

Fm = NΣ(mj-m.)2 /(ν1s2) {2}

Fm = N(Σcmjmj)2/(ν1 s2 Σc2mj) {3}

Fm = N 1vH (HCJ JCTH)-1 HvT1/(ν1 s2) {4}

in which ν1 = J-1 is the numerator degrees of freedom for Fm, and (mathematically) there can be no more than H = J-1 linearly independent contrasts across J means.

Any contrast across the μj takes the form:

Kh = c1μ1 + c2μ2 +. . . cJμJ = 0 {5}

in which the cj are not all zero, and Σcj =0. The contrast in {3} is the maximized contrast, defined as:

cmj = mj-m. {6}

where m. is the average of the J values of mj. The matrix HCJ in {4} holds the contrast coefficients (the cj) for any H = J-1 linearly independent contrasts, and the vector 1νH holds the sample values of the H contrasts.

Alternatives
When {5} is not true, then what is true is the 'alternative':

K'h = c1μ1 + c2μ2 +. . . cJμJ = δh = gh σ √(Σc2j) {7}

in which δh is the linear noncentrality parameter for this hth contrast, expressed in terms of gh, which is a scale-free parameter. The measurement scale (such as inches, centimetres, pounds, kilogrammes) is absorbed by the true (but unknown) standard deviation (σ), and the scale of the contrast is absorbed by √(Σc2j); thus the value of g is exactly the same for the two (equivalent) contrasts:

(μ1 + μ2)/2 - μ3 = g σ √(1.5) {8}

μ1 + μ2 - 2μ3 = g σ √(6) {9}

The Noncentral Variance Ratio Distribution uses a quadratic noncentrality parameter, such as:

Δ = Nδ2/(σ2 Σc2j) = Ng2 {10}

which makes g an even more interesting quantity.

The overall noncentrality parameter (Δm) in the Noncentral Variance Ratio Distribution can be written in at least three ways - analogous to Fm at {2} through {4} - as:

Δm = N Σ(μj-μ.)2/σ2 {11}

Δm = N (Σcμj μj)2/(σ2 Σc2μj) {12}

Δm = N 1δH (HCJ JCTH)-1 HδT1/σ2 {13}

Note that there is no division by ν1 in any of these formulae. The cμj in {12} are the very theoretical, maximizing coefficients cμj = μj-μ., and the vector 1δH holds the linear noncentrality parameters δh for the hth contrast. Finally, if we use {3} to compute Fh for H = J-1 mutually orthogonal contrasts, then:

Fm = ΣFh {14}

It follows that if we used the critical value Fcrit to decide whether to accept or reject null contrasts, then the maximum number of mutually orthogonal null contrasts we could reject in a research study, by that criterion, is:

r = [Fm/Fcrit] ≤ ν1 {15}

in which [] indicates fraction truncation, and r cannot be allowed to exceed the maximum (mathematically) permissible number ν1.

Decision-based Rejection Rates
Usually the researcher is mainly interested in which μj differ from which, and H0 at {1} is of no more than secondary interest (at most); so, if evaluating contrasts post hoc, the researcher will try various contrasts in equation {3}, rejecting the nulls when Fh is large, accepting the null otherwise. To remain logically consistent (contradiction nullifies the whole operation), the researcher ends with decisions for H = J-1 linearly independent contrasts (for simplicity, preferably J-1 mutually orthogonal contrasts), giving the rejected nulls the planned value of g, but subject to change to better fit the data.

But the big question is, what should be the criterion against which Fh should be compared? If the traditional Fα;ν1,ν2 is used, either the probability of detecting false null contrasts goes down, down and down as J is increased; or N must go up, up and up as J is increased. Rodger (1967) argued that it is not the probability (α) of rejecting H0 in error that should be controlled, rather it is the average rate of rejecting true null contrasts that should be controlled; i.e., we should control the expected rate (Eα) of true null contrast rejection. In the same way it is the average rate of rejecting null contrasts when they are not all true that should be controlled, not the probability (power β) of rejecting H0 at {1} when it is false. That is to say, we should control the average or expected rate (Eβ).

Tables of F[Eα];ν1,ν2 and Δ[Eβ];ν1,ν2
To implement the above decision-based error procedure, Rodger (1975a) published tables of F[0.05];ν1,ν2 and F[0.01];ν1,ν2. He also (Rodger (1975b) ) published tables of Δ[Eβ];ν1,ν2 for Eα = 0.05 and for Eα = 0.01. The values reported are for Eβ = 0.50, 0.70, 0.80, 0.90, 0.95, and 0.99. As an example of what the expectations (or averages) Eα and Eβ represent, consider an investigation with J = 4 samples, each with N = 11 observations. That makes the Fm degrees of freedom ν1 = J-1 = 3, and ν2 = J(N-1) = 4×10 = 40. Rodger's (1975a) table reports F[0.05];3,40 = 1.974 and his (1975b) table gives Δ[0.95];3,40 = 9.246. That is not a Δm parameter, it is a Δ per contrast: hence Δm = ν1×Δ[Eβ];ν1,ν2 = 3×9.246 = 27.738. In the analysis of the data in our illustrative experiment we will reject (see {15} above):

r = [Fm/F[0.05];ν1,ν2] = [Fm/1.974] ≤ 3 {16}

null contrasts. We can integrate the Central Variance Ratio Distribution to find the probabilities πr of r = 0, 1, 2, or 3 when all null contrasts are true, and we can integrate the Noncentral Variance Ratio Distribution (with Δm = ν1×Δ[0.95];ν1,ν2 = 3×9.246 = 27.738, when Eα = 0.05) to find the probabilities π'r of r = 0, 1, 2, or 3 if there are ν1 = 3 mutually orthogonal contrasts possible across the μj, each of which has a Δ of 9.246. The procedure is to find the areas under the distribution from F = 0 to F = 1.974 (for π0 and π'0), from F = 1.974 to F = 2×1.974 = 3.948 (for π1 and π'1), from F = 3.948 to F = 3×1.974 = 5.922 (for π2 and π'2) and, finally the area under the distribution from F = 5.922 to F = ∞ (for π3 and π'3). The results are given in Table 1 below.

Table 1: Probabilities πr for Δm=0 and π'r for Δm = 27.738

The πr and π'r are multiplied by r to find the expectation of r because there will be r null rejections made. The formulae are:

Eα = Σr×πr/ν1;       Eβ = Σr×π'r/ν1 {17}

and those are reported at the bottom of Table 1. When all possible null contrasts are true, the expected (i.e., average) proportion of ν1 nulls rejected by the procedure will be exactly Eα = 0.05. When there are ν1 mutually orthogonal nulls that are false, each with Δh = 9.246, then the expected (i.e., average) proportion of ν1 nulls rejected by the procedure will be exactly Eβ = 0.95.

An Illustration
Suppose we intend to use J = 4 samples, and we would like to detect null contrasts that are false by g2 = 0.81 (g = ±0.9) or more at a rate of Eβ = 0.95. We do not know ν2 yet, but Rodger's (1975b) table shows Δ[0.95];3,∞ = 8.370; so as {10} indicates, we should use sample size:

N ≥ Δ[Eβ];ν1,∞/g2 = 8.370/0.81 = 10.33 {18}

If we use N = 11, that will make ν2 = J(N-1) = 4×10 = 40 and Δ[0.95];3,40 = 9.246. Our Δ = Ng2 = 11×0.81 = 8.91 is a little less than 9.246, but we will continue with N = 11, knowing that Eβ is a little less than 0.95 (for the curious, the exact Eβ = 0.942).

Suppose now that the sample data turn out to be those in Table 2:

Table 2: Illustration Data for J = 4, N = 11, s2 = 72

These data yield the Anova Source Table 3.

Equation {15} makes:

r = [Fm/F[Eα];ν1,ν2] = [2.75/1.974] = [1.4] = 1 {19}

We may therefore reject one (out of ν1 = 3) null contrast. Note that the traditional criterion F0.05;3,40 = 2.893 (which is used by Scheffé's procedure) would find nothing 'significant' in these data and, as J increases, the discovery rate by F[Eα];ν1,ν2 grows ever better than that of Fα;ν1,ν2. (Similarly, the post hoc procedures of Tukey and Newman-Keuls, which use studentized range values, are also unable to declare any differences within this illustration data to be 'significant.')

For different forms of contrasts, i.e., having different values of Σc2j, the size of sample effects needed to reject the null are:

Critical = √(ν1×F[Eα];ν1,ν2×s2Σc2j/N) {20}

= √(3×1.974×72Σc2j/11) = √(38.762Σc2j)

Three examples are shown in Table 4 below.

Table 4: Critical Contrast Values for Null Rejection

If it made reasonable scientific sense (in terms of the subject matter studied) the data seem to suggest that μ1 = μ2 < μ3 = μ4 and three orthogonal contrasts saying that are those in Table 5.

For the curious, the maximizing contrast has the coefficients shown from Table 2 as: cmj = mj-m. = -4, -3, 2, 5; so formula {3} gives:

Fm = N(Σcmjmj)2/(ν1 s2 Σc2mj) = 11(54)2/(3×72×54) = 32076/11664 = 2.75 {21}

and, using our decision set in Table 5, formula {4} gives:

Fm = N 1vH (HCJ JCTH)-1 HvT1/(ν1 s2) {22}


 * $$ = N\ _1v_3

\left (_3C_4\ _4C^T_3 \right )^{-1} \ _3v^T_1\ /(3 \times 72) $$
 * $$ = 11\ _1v_3

\left ( \begin{bmatrix} -1 & 1 & 0 & 0 \\ 0 & 0 & -1 & 1 \\ -1 & -1 & 1 & 1 \end{bmatrix} \begin{bmatrix} -1 & 0 & -1 \\ 1 & 0 & -1 \\ 0 & -1 & 1 \\ 0 & 1 & 1 \end{bmatrix} \right )^{-1} \ _3v^T_1 /216 $$
 * $$ = 11\ _1v_3

\begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 4 \end{bmatrix}^{-1} \ _3v^T_1 /216 $$
 * $$ = 11

\begin{bmatrix} 1 & 3 & 14 \end{bmatrix} \begin{bmatrix} 1/2 & 0 & 0 \\ 0 & 1/2 & 0 \\ 0 & 0 & 1/4 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 14 \end{bmatrix} /216 $$
 * $$ = 11

\begin{bmatrix} 1/2 & 3/2 & 14/4 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 14 \\ \end{bmatrix} /216 $$
 * $$ = 11 \times 54\ /\ 216 = 2.75

$$

One can see the simplicity here of inverting the product of H = 3 mutually orthogonal contrasts, whose product is a diagonal matrix. Furthermore, non-orthogonal but linearly independent contrasts not only yield a product matrix that is more difficult to invert, they also make interpretation more complicated since differentially weighted modifications may be desirable for the δh.

Decision Implications
It is obviously true that the following three statements contradict one another:

X:μ6 - μ5 = 0; Y:μ7 - μ6 = 0; Z:μ7 - μ5 > 0 {23}

The old adage is that two things (e.g., μ5 and μ7) that are equal to the same thing (μ6) are equal to one another; so the three statements cannot possibly all be true - no matter what statistical tests on their sample estimates say! Nevertheless, this type of contradiction (explicitly or implicitly) occurs often enough in reports of statistical analysis to be quite an embarrassment! The three comparisons in {23} are not linearly independent of one another because, algebraically:

Z = X + Y {24}

Rodger's method precludes drawing logically contradictory conclusions from any set of data by requiring that each statistical decision be linearly independent of every other one. But there is a very important, positive result to be extracted from the implication of decisions. In algebraic terms, a set of H decisions for contrasts (such as those in Table 5) can be expressed in matrix form as:

1μJ JCTH = 1δH {25}

If we could only invert JCTH (i.e., divide it away somehow) we could find out what our H decisions say about all the μj - quite an achievement in terms of saying what the investigator believes the investigation has shown, without the 'noise' of sample error - and a valuable guide to future research on the subject matter. But JCTH is not even a square matrix; so its regular inverse does not exist! Happily there exist 'generalized inverses' that can be used in some occasions of this sort - and this 'implication need' is just such an occasion! The result is Rodger's 'implication equation':

1μJ = 1δH (HCJ JCTH)-1 HCJ {26}

From the decision set in Table 5:

1μ4 = 1δ3 (3C4 4CT3)-1 3C4 {27}


 * $$ =\ _1 \delta_3

\begin{bmatrix} 1/2 & 0 & 0 \\ 0 & 1/2 & 0 \\ 0 & 0 & 1/4 \end{bmatrix} \begin{bmatrix} -1 & 1 & 0 & 0 \\ 0 & 0 & -1 & 1 \\ -1 & -1 & 1 & 1 \end{bmatrix} $$

\begin{bmatrix} 0 & 0 & 1.8 \sigma \end{bmatrix} \begin{bmatrix} -1/2 & 1/2 & 0 & 0 \\ 0 & 0 & -1/2 & 1/2 \\ -1/4 & -1/4 & 1/4 & 1/4  \end{bmatrix} $$

\begin{bmatrix} -.45 \sigma & -.45 \sigma & .45 \sigma & .45 \sigma \end{bmatrix} $$

These are the values of μj-μ. (not the μj alone) implied by the decisions in Table 5. This procedure produces results that would not be clearly seen otherwise when there are more null rejections.

The results in {27} yield an overall, quadratic noncentrality parameter:

Δμ = NΣ(μj-μ.)2/σ2 = 11×0.81σ2/σ2 = 8.91 {28}

This equals Ng2 because only one null contrast was rejected and given δ3 = 0.9 σ√4.

There are ways of modifying the gh to reflect more closely the sample observations, and the SPS computer program (discussed below) uses two of these procedures by default. SPS also automatically provides two separate statistics which assess the fit of the implied μj-μ. One of these statistics is the correlation coefficient between these implied true population means and the sample means, and the other is the F fit residual (i.e., the amount of the omnibus Fm value that is not accounted for by the as many as r rejected null contrasts among the J-1 statistical decisions). A high degree of 'fit' between the sample and implied means is a necessary condition for concluding that a particular set of J-1 decisions is the scientifically optimal one, but it can never be a sufficient condition for drawing that conclusion. When partitioning the overall Anova between-groups variance into independent components (and J > 2), it is theoretically possible to do this in an infinite number of ways (i.e., "literally," that many, infinitesimally different from one another, sets of J-1 mutually orthogonal contrasts could be constructed). The statistical fit between the implied true population means that are mathematically entailed by any specific set of J-1 decisions, and the sample means one began with, matters. But it is the scientific sense that those J-1 statistical decisions make that, statistically speaking, needs to be of ultimate concern.

Further Applications
Of course, the methodology shown here can be applied to other forms of Anova, e.g., to Randomized Blocks data, but Rodger (1974) has argued that data collected for a Factorial Design analysis (e.g., for I×J×K factors) are better analyzed by his method in a one-way Anova for L = I×J×K samples. F[Eα];L-1,ν2 does not have the loss of effect detectability as L is increased, as does happen if the traditional Fα;L-1,ν2 were used; so 'common sense' interactions (by simple contrasts across the mijk) have a reasonable chance of being detected. The 2013 article by Rodger and Roberts shows clearly how effect detectability by the traditional standard (Fα;L,ν2) becomes poorer and poorer as L is increased, which does not happen if Rodger's F[Eα];L,ν2 is used.

Rodger (1969) showed his method can be applied to evaluating contrasts, or linear hypotheses, across frequencies in 2×J contingency tables, and could be extended to correlated frequencies (both of these are options in the SPS computer program). The analysis of ranked data (correlated or otherwise) are further non-parametric options available in SPS.

On the multivariate front, SPS offers analysis of the one-sample version of Hotelling's (1931) T2. Tables for Hotelling's (1951) Generalized T2 have been computed (by R.S. Rodger) for Multivariate Analysis of Variance, but these have not yet been published.

Finally, it is possible to set alternatives to null contrasts in all-numeric terms (no unknown σ required - see {7} above) by doing two-stage sampling. Rodger (1978) has published the relevant tables of the noncentrality parameter D[Eβ];ν1,ν2.

Using Rodger's Method with the SPS Computer Program
The Simple, Powerful Statistics (SPS) computer program mentioned in the two previous subsections is a free, Windows-based one which implements a comprehensive set of the important features of Rodger's method. As already noted, SPS makes it relatively easy to use Rodger's method with either independent or correlated means, proportions, or ranks; and analyzing two-stage sampling data can be done almost as easily as doing that with the usual single stage of sampling. SPS can be downloaded from the Simple, Powerful Statistics website (see the external links section immediately below the references). An article about both the computer program and Rodger's method was published in the Journal of Methods and Measurement in the Social Sciences, and that can be downloaded by clicking the link contained in reference #13.

The Bottom Line on Rodger's Method
As demonstrated in the 'An Illustration' section above, using the traditional Fα;ν1,ν2 criterion produces an inevitable loss of statistical power as ν1 (the numerator degrees of freedom) increases. In direct contrast, “Rodger’s approach ensures that statistical power does not decline (and even increases) with increasing numerator degrees of freedom” (Delamater, Campese, & Westbrook, 2009; p.228). As another set of researchers put it: “We chose Rodger’s method because it is the most powerful post hoc method available for detecting true differences among groups. This was an especially important consideration in the present experiments in which interesting conclusions could rest on null results” (Williams, Frame, & LoLordo, 1992, p.43). The most definitive evidence for the statistical power advantage that Rodger's method possesses (as compared with eight other multiple comparison procedures) is provided in the 2013 article by Rodger and Roberts (downloadable with the link in reference #8).

A corollary that necessarily follows from the truth of the highlighted proposition in the previous paragraph, of course, is that Rodger's method has more power than all other post hoc procedures to detect every conceivable sort of interaction effect. Importantly, though, Rodger's method also permits completely ignoring every factorially-defined interaction (which are widely acknowledged to be difficult to interpret), and encourages focusing instead on "the interactions defined by common sense" including "simple cross-cell contrasts such as μ11 - μ22 which are easy to interpret ... though they are not the interactions defined in the factorial model" (Rodger, 1974, p.195).

An absolutely unlimited amount of post hoc data snooping is permitted by Rodger's method, and this is accompanied by a guarantee that the long-run expectation of type 1 errors can never exceed Eα (i.e., .05 or .01]. This statement is not in need of any testimonial support or empirical verification (nor is it susceptible to disconfirmation), because it is essentially a logical tautology.  Both the increased power that Rodger's method possesses, and the impossibility of type 1 error rate inflation, are obtained by using a decision-based (i.e., per contrast) error rate - analogous to the rate used in planned t-tests.  As noted at the end of the 'Decision Implications' section, whenever J>2, an infinite number of orthogonal contrasts can be constructed and combined into that many mutually orthogonal sets of J-1 statistical decisions.  Rodger's method precludes every one of these potentially infinite number of sets from ever containing more than r rejected null contrasts (see {15} above). Consequently, each and every one of the rejected nulls that are included in whichever set you decide to adopt and interpret will (necessarily, by virtue of the manner in which Rodger's method was conceived and built) have an expected type 1 error rate of either: 1) Eα if r rejected nulls were included in that set, or 2) less than Eα if the decision set you decided upon contains fewer than r null contrast rejections. Rodger's method does its job - which in this context is not to prevent statistical errors from occurring, but to control their rate of occurrence.

A unique feature of Rodger's method is its specification of the 'implied means' (or implied proportions or implied mean ranks) that are logically implied, and mathematically entailed, by the J-1 (number of means minus one) statistical decisions that the user of his method will make. These implied true population means constitute a very precise statement about the outcome of one's research, and assist other researchers in determining the size of effect that their related research ought to seek.

The single best source for finding out more about Rodger's method is his 1974 article (reference #7).