help cochranq
-------------------------------------------------------------------------------------------------------------------------


Title

    cochranq -- Cochran's Q test for proportion difference in blocked binary data


Syntax

        cochranq scorevar blockvar groupvar [if] [in] [fweight] [, ma(method) es(method) noqtest nolabel wrap list
                level(#) copyleft]


    cochranq options              Description
    -------------------------------------------------------------------------------------------------------------------
    Main
      ma(method)                  which method to adjust for multiple comparisons
      es(method)                  choice of effect size calculations
      noqtest                     suppress Cochran's Q test output
      nolabel                     display groupvar values, rather than groupvar value labels
      wrap                        do not break wide tables
      list                        include results of pairwise tests in a list format.
      level(#)                    set confidence level; default is level(95)
      copyleft                    displays the GPL license for cochranq
    -------------------------------------------------------------------------------------------------------------------
    fweights are allowed; see weight.

Description

    Cochran's omnibus Q test is analogous to a repeated measures ANOVA for binary outcomes, and cochranq reports the
    results of Cochran's Q test 1950 for proportion difference among b blocked binary outcomes across k groups.

    The null hypothesis is that there is no difference in proportion of success (i.e.  outcome = 1) between the k
    groups; Cochran's Q can be considered a generalization of McNemar's test to an arbitrary number of groups.  In the
    syntax diagram above, scorevar refers to the variable recording the binary outcome, blockvar refers to the variable
    denoting the units being observed (e.g., test subjects), and groupvar refers to the different treatments,
    exposures, tasks, etc.  cochranq also calculates the non-asymptotic p-value for the Q statistic, which generally
    provides greater statistical power (Mielke and Berry, 1995).  The use of fweights specifies the number of times an
    observed pattern of successes and failures across different groups is observed (e.g., see the structure of the
    diphtheria.dta data set and example command below).

    The non-asymptotic statistic is distributed using a variation on the Pearson Type III distribution, and the PDF of
    this distribution is numerically integrated over from -2/gamma to Z with 1,000 steps in order to calculate the
    p-value.  Mielke and Berry (1995) write that "more information is available to the nonasymptotic approach.
    Consequently, when the effective n is small, one cannot expect a result based on an infinite n to be appropriate.
    Because the Pearson type III distribution encompasses the chi-squared distribution as a special case, the
    nonasymptotic approach completely replaces the asymptotic approach."

    cochranq presents a table of all m = k(k-1)/2 post hoc pairwise tests using Cochran's Q with both groups in the
    pair (for the asymptotic p-values this is equivalent to McNemar's test without continuity corrections. The post hoc
    tests may specify multiple comparisons adjustments using ma(), and p-values (adjusted or unadjusted) for both
    asymptotic (top) and non-asymptotic (bottom) distributions are presented (the p-values for the non-asymptotic tests
    are indicated with the label na).  See Remarks for consideration of situations where two or more pairwise
    comparisons have the same test statistic. When no discordant pairs are present in a post hoc test, missing test
    statistics and p-values are reported.

Options

    nolabel causes the actual data codes to be displayed rather than the value labels in the test output.

    ma(method) Specifies the method of adjustment used for multiple comparisons in post hoc pairwise tests, and must
        take one of the following values: none, bonferroni, sidak, hs, hochberg, bh, or by.  none is the default method
        assumed if the ma option is omitted.  These methods perform as follows:

        none specifies no multiple comparisons adjustments be made.

        bonferroni specifies the family-wise error rate (FWER) "Bonferroni adjustment", calculated by multiplying the
        p-values for each post hoc test by m (the total number of post hoc tests), as per Dunn (1961).  cochranq will
        report a maximum Bonferroni-adjusted p-value of 1.  Those comparisons rejected with this method at the alpha
        level specified by level() are underlined in the output table, and starred in the list using the list option.

        sidak specifies the "Sidák adjustment" so that FWER is adjusted by multiplying the p-value of each post hoc test
        with 1 - (1 - p)^m as per Sidák (1967).  cochranq will report a maximum Sidák-adjusted p-value of 1.  Those
        comparisons rejected with this method at the alpha level specified by level() are underlined in the output table,
        and starred in the list using the list option.

        holm specifies the "Holm adjustment" where the FWER is controlled by sequentially adjusting the p-values of each
        post hoc test, ordered from smallest to largest, with p(m+1-i), where i is the ordered position, as per Holm
        (1979).  cochranq reports a maximum Holm-adjusted p-value of 1.  In sequential tests the decision to reject or
        not reject the null hypothesis depends both on the p-values and their ordering, so those comparisons rejected
        with this method at the alpha level specified by level() are underlined in the output.

        hs specifies the "Holm-Sidák adjustment" where the FWER is controlled by sequentially adjusting the p-values of
        each post hoc test, ordered from smallest to largest, with 1 - (1 - p)^(m+1-i), where i is the ordered position
        (see Holm, 1979).  cochranq reports a maximum Holm-Sidák-adjusted p-value of 1.  In sequential tests the decision
        to reject or not reject the null hypothesis depends both on the p-values and their ordering, so those comparisons
        rejected with this method at the alpha level specified by level() are underlined in the output.

        hochberg specifies a "Hochberg adjustment" where the FWER is adjusted sequentially by adjusting the p-values of
        each pairwise test as ordered from largest to smallest with p*i, where i is the position in the ordering as per
        Hochberg (1988).  cochranq reports a maximum Hochberg-adjusted p-value of 1.  In sequential tests the decision to
        reject the null hypothesis depends both on the p-values and their ordering, those comparisons rejected with this
        method at the alpha level specified by level() are underlined in the output.

        bh specifies the "Benjamini-Hochberg adjustment" where the false discovery rate (FDR) is controlled by
        sequentially adjusting the p-values of each post hoc test, ordered from largest to smallest, with p[m/(m+1-i)],
        where i is the ordered position (see Benjamini & Hochberg, 1995).  cochranq reports a maximum
        Benjamini-Hochberg-adjusted p-value of 1.  FDR-adjusted p-values are at times referred to as q-values.  In
        sequential tests the decision to reject or not reject the null hypothesis depends both on the p-values and their
        ordering, so those comparisons rejected with this method at the alpha level specified by level() are underlined
        in the output.

        by specifies the "Benjamini-Yekutieli adjustment" where the false discovery rate (FDR) is controlled by
        sequentiallyby adjusting the p-values of each pairwise test as ordered from largest to smallest with
        p[m/(m+1-i)]C, where i is the position in the ordering, and C = 1 + 1/2 + ... + 1/m (see Benjamini & Yekutieli,
        2001).  Stata will report a maximum Benjamini-Yekutieli-adjusted p-value of 1.  Such FDR-adjusted p-values are
        sometimes referred to as q-values in the literature.  Because in sequential tests the decision to reject the null
        hypothesis depends both on the p-values and their ordering, those comparisons rejected with this method at the
        alpha level specified by level() are underlined in the output.

    es(method) specifies the method of calculation of effect size to be reported, and must take one of the following
        values: none, scm, or bjm.  none is the default method assumed if the es option is omitted.  These methods
        perform as follows:

        none specifies no effect size measure be reported.

        scm specifies the Serlin, Carr and Marascuillo maximum-corrected effect size, Q/[b(k-1)], be reported, as per
        Serlin, Carr and Marascuillo (2007).

        bjm specifies the Berry, Johnston and Mielke chance-corrected effect size, R = 1 - delta/mu_delta, be reported,
        as per Berry, Johnston and Mielke (2007). CAVEAT: The example calculation in Berry, Johnston and Mielke's paper
        includes the figure mu_delta = 0.4521, but Equation [7] contains a typographical error, and the first term should
        be 2/[b(b-1)] rather than 2/[k(k-1)] (personal correspondence with Berry).

    noqtest suppresses the display of the omnibus Cochran's Q test table.

    nolabel causes the actual data codes to be displayed rather than the value labels in the Cochran's Q test table.

    wrap requests that cochranq not break up wide tables to make them readable.

    list requests that conovertest also provide a output in list form, one pairwise test per line.

    level(#) specifies the compliment of alpha*100.  The default, level(95) (or as set by set level) corresponds to
        alpha = 0.05.

    copyleft displays the copying permission statement for cochranq.  cochranq is free software, licensed under the
        GPL. The full license can be obtained by typing:

            . net describe cochranq, from (http://www.alexisdinno.com/stata)

    and clicking on the click here to get link for the ancillary file.


Remarks

    The issue of tied multiple comparisons may arise when conducting post hoc tests following Cochran's Q test. This is
    because the score variable is nominal, and more than one pairwise test may share a specific value of Q due to
    having the same number of discordant pairs of observations. This is less likely to arise when n, or k or both are
    large. Tied test statistic values is an issue because several of the available multiple comparison procedures are
    stepwise procedures, which give different adjustments based on the position in the ordering of the p-values. It is
    unclear what the appropriate course of action is when attempting to use either the Holm or Holm-Sidák FWER
    adjustments in the presence of ties. cochranq makes an arbitrary ordering of p-values when there are ties, and
    reports the adjusted accordingly, but users should interpret these numbers with caution.

    This issue does not arise when adjusting using the FDR. From Korn, et al. (2004):

        If the variables or p-values are discrete, there can be ties in the p-values given in (1), but this does
        not present a problem. Regardless of the ordering of the tied variables in (1), if the hypothesis
        associated with the first variable in the order is rejected, then the hypotheses associated with the
        other tied variables will also be rejected because the minimization (2) will be over smaller sets for the
        other variables. In addition, which of the tied variables is considered first for rejection will not
        matter, as the permutation distribution will include all of them when considering the first rejection.
        Also, if the first of the tied variables fails to reject, the procedure ceases and no further hypotheses
        are rejected, so that the situation in which the first tied variable fails to reject, and the later tied
        variables do reject, need not be considered.


Example

    Setup
        . use diphtheria

    Test for proportion difference of culture growth by growth media
        . cochranq growth cases media [fw=ncases]

    Setup
        . use motorskills

    Test for proportion difference of task completion by motor skill type
        . cochranq score subject task

    Setup
        . use psychgrads

    Test for proportion difference of diagnosis by script, with effect size
        . cochranq diagnosis student script, es(bjm)


Author

    Alexis Dinno
    Portland State University
    alexis.dinno@pdx.edu

    Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated
    by sending along (1) a copy of the data (de-labeled or anonymized is fine) in Stata .dta file format, (2) a copy of
    the command used and (3) a copy of the exact output of the command.


Suggested citation

    Dinno A. 2017. cochranq: Cochran's Q test for proportion difference in blocked binary data. Stata software package.
        URL: http://www.alexisdinno.com/stata/cochranq.html


Saved results

    cochranq saves the following in r():

    Scalars   
      r(Q)           Cochran's Q statistic
      r(b)           the number of blocks (subjects) in the test
      r(k)           the number of treatments (groups) in the test
      r(df)          degrees of freedom for the test
      r(p_asymp)     p-value for the asymptotic test
      r(p_nonasymp)  p-value for the nonasymptotic test
      r(Z)           the standardized z test statistic
      r(gamma)       the gamma parameter for the Pearson Type III distribution approximating the exact permutation
                       distribution of Z
      r(X2)          An m length vector of pairwise Q statistics (chi-squared statistics).
      r(P_asymp)     An m length vector of asymptotic p-values for pairwise tests.
      r(P_nonasymp)  An m length vector of non-asymptotic p-values for pairwise tests.


References

    Benjamini, Y. and Hochberg, Y. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to
        Multiple Testing.  Journal of the Royal Statistical Society. Series B (Methodological).  57: 289-300.

    Benjamini, Y. and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency.
        Annals of Statistics, 29:  1165-1188.

    Berry, K. J., Johnston, J. E., and Mielke, Jr., P. W. 2007. An alternative measure of effect size for Cochran's Q
        test for related proportions.  Perceptual and Motor Skills. 104: 1236-1242.

    Cochran, W. G. 1950.  The comparison of percentages.  Biometrika, 37: 256-266.

    Dunn, O. J. 1961. Multiple comparisons among means.  Journal of the American Statistical Association.  56: 52-64.

    Hochberg, Y. 1988. A sharper Bonferroni procedure for multiple tests of significance.  Biometrika. 75: 800-802.

    Korn, E. L., Troendle, J. F., McShane, L. M., and Simon, R. 2004.  Controlling the number of false discoveries:
        application to high-dimensional genomic data.  Journal of Statistical Planning and Inference.  124:  379-398.

    Mielke, P. W. and Berry, K. J. 1995.  Nonasymptotic inferences based on Cochran’s Q test.  Perceptual and Motor
        Skills, 81: 319-322.

    Sidák, Z. 1967. Rectangular confidence regions for the means of multivariate normal distributions.  Journal of the
        American Statistical Association.  62: 626-633.

    Serlin, R. C., Carr, J., and Marascuillo, L. A. 2007. A measure of association for selected nonparametric
        procedures.  Psychological Bulletin.  92: 786-790.


Also See

      Help: anova, kwallis