Simulated Type I Error Rates for Unbalanced Cluster Samples

We performed a simulation study to
compare the Type I error performance of ten analytic methods
for cluster randomized designs. The analytic methods were applied to a cluster randomized
design with two treatment groups and one level of clustering, under several scenarios of
cluster size imbalance.
The lookup table below allows scientists to investigate the Type I error performance of the
analytic methods in scenarios which best match their intended design.
It is most useful for researchers who know the anticipated cluster sizes and correlation
for their planned study.

The table below shows all scenarios from our simulation study of Type I error rates
in unbalanced cluster samples. To examine Type I error rates for your specific study design,
filter the table based on the parameters of imbalance listed above the table.

For example, if you anticipate low intracluster correlation in your study, select
an intracluster correlation of 0.001 from the dropdown list. If you also expect about 30
participants per cluster, then you would select values for nbar1 and nbar2 closest to 30.
From the values available in the simulation study, a cluster size of 32 would be the best choice.
You may filter by additional parameters as needed to match your planned study.

Once you have filtered the results, examine the Type I error rates in the table for each method.
Methods which have Type I error rates closest to 0.05 will provide the best Type I error control
for your design.

Column

Filter

Intracluster correlation (rho)

Number of clusters in treatment group 1 (m1)

From
to

Number of clusters in treatment group 2 (m2)

From
to

Average cluster size in treatment group 1 (nbar1)

From
to

Average cluster size in treatment group 2 (nbar2)

From
to

The ratio of maximum to minimum cluster size in treatment group 1 (r1)

From
to

The ratio of maximum to minimum cluster size in treatment group 2 (r2)

The table below summarizes each of the ten statistical methods tested. One-stage models are
linear mixed models, which account for within-cluster correlation using a random intercept.
Two-stage models first calculate the average outcome within each cluster, and then analyze
the resulting cluster means using a general linear univariate model.

Method

Model

Details

1

One-stage

Mixed model with Kenwood-Roger denominator degrees of freedom, variance constrained positive

2

One-stage

Mixed model with Kenwood-Roger denominator degrees of freedom, unconstrained variance

3

One-stage

Mixed model denominator degrees of freedom
$m-g$, variance constrained positive

4

One-stage

Mixed model with denominator degrees of freedom
$m-g$, unconstrained variance

5

Two-stage

General linear model with weight matrix $W={I}_{m}$

6

Two-stage

General linear model with weight matrix
$W=$diag$\left(\stackrel{g}{\underset{h=1}{\u22a1}}\stackrel{{m}_{h}}{\underset{i=1}{\u22a1}}{n}_{hi}\right)$

7

Two-stage

General linear model with weight matrix
$W=$diag$\left(\stackrel{g}{\underset{h=1}{\u22a1}}\stackrel{{m}_{h}}{\underset{i=1}{\u22a1}}1/{n}_{hi}\right)$

8

Two-stage

General linear model with weight matrix
$W=$diag$\left(\stackrel{g}{\underset{h=1}{\u22a1}}\stackrel{{m}_{h}}{\underset{i=1}{\u22a1}}\left[{n}_{hi}\left({n}_{hi}-1\right)\right]/\left[{y}_{hi}\text{'}{y}_{hi}-{n}_{hi}{\stackrel{\u203e}{y}}_{1,hi}\right]\right)$

9

Two-stage

General linear model with weight matrix
$W=$diag$\left[\stackrel{g}{\underset{h=1}{\u22a1}}\stackrel{{m}_{h}}{\underset{i=1}{\u22a1}}{\left({\hat{\sigma}}_{c}^{2}+{\hat{\sigma}}_{e}^{2}/{n}_{hi}\right)}^{-1}\right]$, and variance constrained positive

10

Two-stage

General linear model with weight matrix
$W=$diag$\left[\stackrel{g}{\underset{h=1}{\u22a1}}\stackrel{{m}_{h}}{\underset{i=1}{\u22a1}}{\left({\hat{\sigma}}_{c}^{2}+{\hat{\sigma}}_{e}^{2}/{n}_{hi}\right)}^{-1}\right]$, and unconstrained variance