Early in the reign of Augustus, Dionysius of Halicarnasus commented that "history is
philosophy from examples". We think of evidence in much the same way, in seeking
examples from the archaeology of medicine to learn what constitutes good science and
what bad, perhaps leavened here and there with a bit of real philosophy and science.
Theory tells us that randomisation is good, and examples from reviews frequently
confirm it. Yet we are condemned to relearn the lessons because so many systematic
reviews include trials whose architecture potentially misleads.
And it's not just about architecture, because size is also important. Indeed, the
two are linked, so some helpful revisiting of the issues of trial quality [1] and
size [1,4] revivify our knowledge of these matters.
Quality and size [1]
Two Danish researchers looked for large clinical trials with at least 1,000 patients
together with metaanalyses of small trials. They were asking the sensible question
about how possible discrepancies between large trials and metaanalyses could be
affected by methodological quality.
They found 14 metaanalyses, pulled all the original papers, subjected those to
quality review, and examined outcomes in terms of odds ratios. They then used the
ratio of the odds ratio in the large randomised trial to that from the metaanalysis
of small trials to produce a "ratio of odds ratios" as the final outcome. When the
ratio of odds ratios was significantly less than 1, that indicated that small trials
with particular quality criteria exaggerated the effect of an intervention compared
with the large trial.
The quality criteria they tested for were generation of the allocation sequence,
allocation concealment, double blinding, and withdrawals or dropouts. The relevant
criteria are in Table 1.
Table 1: Quality criteria tested
Quality feature

Adequate

Inadequate

Generation of the
allocation sequence 
computergenerated random number or
similar 
not described 
Allocation
concealment 
central independent unit, sealed
envelope, or similar 
not described, or open table of
random numbers 
Double blinding 
identical placebo or similar 
not described, or tablets versus
injection not double dummy 
Withdrawals or
dropouts 
number and reasons for drop
outs 
not described 
Results
They used 23 large trials and 167 small trials with 136,000 patients. Compared
with large trials, small trials with inadequate generation or allocation
concealment of the randomisation sequence, or those that were not adequately
double blinded overestimated the effect of treatment (Table 2). When
methodological quality was compared in large and small trials, inadequate
generation of the randomisation sequence and inadequate doubleblinding caused
overestimation of the treatment effect (Table 3), and much the same was found
for a similar analysis of small trials alone.
Table 2: Comparison of large trials with small trials with different quality
criteria

Common comparator

Comparison

Ratio of odds ratios
(95%CI) 
Large trials 
Small trials with
inadequate
generation of allocation sequence

0.46 (0.25 to 0.83) 
Large trials 
Small trials with
adequate
generation of allocation sequence

0.90 (0.47 to 1.76) 
Large trials 
Small trials
inadequate
allocation concealment

0.49 (0.27 to 0.86) 
Large trials 
Small trials
adequate
allocation concealment

1.01 (0.48 to 2.11) 
Large trials 
Small trial with
inadequate
or no double blinding

0.52 (0.28 to 0.96) 
Large trials 
Small trial with
adequate
or no double blinding

0.84 (0.43 to 1.66) 
Large trials 
Small trials with
inadequate
follow up

0.72 (0.30 to 1.71) 
Large trials 
Small trials with
adequate
follow up

0.58 (0.32 to 1.02) 
When the ratio of the odds
ratios is less than 1, it indicates that the feature (inadequate blinding, for
example) exaggerates the intervention effect 
Table 3: Comparison of adequate versus inadequate quality criteria in large
and small trials

Common comparator

Comparison

Ratio of odds ratios
(95%CI) 
Adequate 
Inadequate
generation of allocation sequence

0.49 (0.30 to 0.81) 
Adequate 
Inadequate
allocation concealment

0.60 (0.31 to 1.15) 
Adequate 
Inadequate
or no double blinding

0.56 (0.33 to 0.98) 
Adequate 
Inadequate
follow up

1.50 (0.80 to 2.78) 
When the ratio of the odds
ratios is less than 1, it indicates that the feature (inadequate blinding, for
example) exaggerates the intervention effect 
Quality scoring using the Oxford system [2], perhaps one of the most commonly
used scoring systems in systematic reviews, produced sensible results. Small
trials with lower quality scores overestimated treatment effects compared with
large trials. Small trials with higher quality scores did not. With both large
and small trials, treatment effects were exaggerated with low versus high quality
scores.
Size [3]
It is obvious that if we have a very small amount of information, from few
patients, that the effects of random chance can be significant. As the amount of
information or number of patients increases, then the effects of chance will
diminish. In some circumstances, like acute pain trials, we can define how much
information is needed for us to be confident not just that a treatment works, but
how big is the effect of that treatment [3].
Confirmation that our estimate of the effect of treatment can be heavily
dependent on size comes from a study from the USA and Greece [4]. Researchers
looked at 60 metaanalyses of randomised trials where there were at least five
trials published in more than three different calendar years. They were in either
pregnancy and perinatal medicine or myocardial infarction.
For each metaanalysis trials were chronologically ordered by publication year
and cumulative metaanalysis performed to arrive at a pooled odds ratio at the
end of each calendar year. The relative change in treatment effect was calculated
for each successive additional calendar year by dividing the odds ratio of the
new assessment with more patients by the odds ratio of the previous assessment
with fewer patients. This gives a "relative odds ratio", in which a number
greater than 1 indicated more treatment effect, and one less than 1 indicates
less treatment effect.
The relative odds ratio can be plotted against the number of patients included.
The expected result is a horizontal funnel, with less change with more patients,
and the relative odds ratio settling down to 1.
Results
In the paper, the two graphs for pregnancy/perinatal medicine and myocardial
infarction showed exactly this expected pattern, but are just impossible to
reproduce here. Below 100 patients the relative odds ratios varied between 0.2
and 6. By the time 1000 patients were included they were between 0.5 and 2. By
5,000 patients they settle down close to 1. The 95% prediction interval for the
relative change in the odds ratio for different numbers for both examples is
shown in Table 4.
Table 4: 95% prediction interval for relative change in odds ratio for
different numbers of accumulated patients randomised


Fixed effect
prediction interval for relative change in odds ratio 
Number of patients

Pregnancy/perinatal 
Myocardial infarction 
100 
0.32  2.78 
0.18  5.51 
500 
0.59  1.71 
0.60  1.67 
1000 
0.67  1.49 
0.74  1.35 
2000 
0.74  1.35 
0.83  1.21 
15000 
0.85  1.14 
0.96  1.05 
When evidence was based on only a few patients there was substantial uncertainty
about how much the pooled treatment effect will change in the future. With only
100 patients randomised, additional information from more trials could multiply
or divide the odds ratios at that point by three.
Comment
At first look this is all complicated pointyhead stuff, but actually it's no
more than simple common sense. If trials are not done properly, they might be
wrong. If trials are small, they might be wrong. To be sure of what we know we
need large data sets of high quality, whether from single trials or
metaanalyses. The corollary is that if we have small amounts of information, or
information of poor quality, the chance of that result being incorrect is
substantial, and then we need to be cautious and conservative.
Cynics might say that much decisionmaking in healthcare is done on small
amounts of inadequate information. They may be right, but knowing that that
information may be misleading is still helpful, because we know that we need to
examine what we do in practice to check that it conforms with what we thought we
started out with. Suspending belief is not an option.
References:
 LL Kjaergard & C Gluud. Reported methodologic quality and discrepancies
between large and small randomised trials in metaanalyses. Annals of Internal
Medicine 2001 135: 982989.
 AR Jadad et al. Assessing the quality of reports of randomized clinical
trials: is blinding necessary? Controlled Clinical Trials 1996 17: 112.
 RA Moore et al. Size is everything  large amounts of information are
needed to overcome random effects in estimating direction and magnitude of
treatment effects. Pain 1998 78: 20916.
 JP Ioannidis & J Lau. Evolution of treatment effects over time:
empirical insight from recursive metaanlyses. Proceedings of the National
Academy of Sciences 2001 98: 831836.

previous
or
next
story in this issue