## (3) Epidemiology

Liberian quarantine centre attack increases fears of Ebola's spread

BREAKING: 3,000 "Ebola Martyrs" Ready To Strike America in biggest "Apocalyptic Attack"

Global Alert and Response (GAR) - Ebola virus disease (EVD)

Haemorrhagic fevers, Viral - Ebola virus

Cholera - World Health Organization

**Comparing rates and risks while minimizing biases**

The usually long research journey to show that a factor is a cause of disease starts by comparing incidence rates or risks between different groups of people. Noting that the rate of occurrence of type 2 diabetes is higher in a group of overweight people observed for several years than in people of normal weight suggests that excessive weight may be among the determinants of diabetes. Before this suggestion can be transformed into conclusive evidence, two conditions need to be fulfilled: (1) demonstrating that there is an association between the exposure, overweight, and diabetes incidence.

Overweight people may differ from people of normal weight in many respects: gender, age, diet, amount of physical exercise, and any of these, rather than excess weight itself, may be responsible for the increase in the rate of onset of diabetes. As a first step to tackling this problem, we can exclude some of the possible interfering factors by restricting our study to normal and overweight people belonging to only one gender and age range, say males aged 40 to 59 in Flower City. Being overweight is defined as having a body mass index (BMI) of more than 25. BMI gauges weight in relation to a person's height (it is computed as the ratio of weight divided by the square of height): a BMI in the range 19 to 25 is regarded as normal. People with a BMI higher than 25 are considered overweight, and within this category those with a BMI over 30 are classified as obese. The approach of removing possible interfering factors by confining the study to only some categories of people soon reaches its limit. Restricting further the study to, say, sedentary people eating a specific type of diet will not only drastically reduce the number of subjects available for investigation but, worse, it may also make its results inapplicable to the population in general. A better, and in fact the most commonly employed, method consists in acquiring and recording for each subject information on factors such as diet and physical exercise so that at the time of data analysis the comparison between normal and overweight people can be made not overall but first within subgroups (so-called `strata') with the same type of diet and level of physical activity, and then summarized in an overall `adjusted' comparison, freed of the influence of these factors.

According to the study design just mentioned, a cohort of 3,000 volunteer males, aged 40 to 59, free of diabetes and resident in Flower City, has been recruited and followed up for one year, recording the new cases of type 2 diabetes. At recruitment, 1,980 subjects turned out to have a normal weight, while 1,020 were overweight. During the one-year follow-up, 15 new cases of diabetes and 45 deaths (from any cause) were recorded among the former, and 23 cases of diabetes and 49 deaths among the latter. The person-years derive from assuming that subjects dying or becoming new cases of diabetes remained at risk of such events on average for one half of the year of observation. Hence the person years for people of normal weight aged 40-49 are computed as: 1,173 - (20 + 6)/2 = 1,160, and the person-years for the other groups are derived in the same way. Among the normal weight subjects, the 15 cases of diabetes occurred out of 1,950 person-years at risk of developing the disease, an incidence rate of 7.7 per 1,000 person-years. For the overweight subjects, the 23 cases occurred out of 984 person-years, an incidence rate of 23.4 per 1,000 person-years. It is, however, disturbing to realize that enrolling only men aged 40 to 59, rather than all adult males, has not sufficed to remove the possible influence of age.

The percentage of older people, aged 50-59, who are overweight is in fact about double (81%) the percentage (41%) who are normal weight, and this might be the real reason for the higher rate of new cases of diabetes, as it is well established that diabetes incidence increases with age. To remove the influence of age, we need to compare the rates between two groups having each the same composition by age, the simplest being 50% of people aged 40-49 and 50% of people aged 50-59. If these `standard', i.e. fixed, percentages apply, the rates of 5.2 and 11.4 for the two age subgroups among the normal-weight people would each have an equal importance (or, technically, have the same `weight') and their average rate would simply be (5.2 + 11.4)/2 = 8.3. Similarly for the overweight people, the average rate would be (10.4 + 26.5)/2 = 18.5. These two rates are age-adjusted by a standardization procedure: they still differ (18.5 - 8.3 = 10.2) but materially less than the two overall, or crude, rates (23.4 - 8.3 = 15.1).

After removing the influence of age, a difference remains that may indeed reflect an effect of weight or, annoyingly, of other potentially interfering factors like amount of exercise or diet. In fact, what is generally done in epidemiological studies is to adjust the rates not only for a single factor like age but for all interfering factors - called confounders or confounding factors - known to be capable of inducing a difference in rates. A host of statistical methods, much more complex than the simple standardization procedure just outlined, are currently available in computer software packages for implementing simultaneous adjustment for multiple confounders. The most often used appear in scientific papers under such names as Cox's regression (or proportional hazard regression) and Poisson regression for adjusting rates and logistic regression for adjusting risks. Regression has nothing to do with decadence but is a general term for a wide family of statistical methods analysing the dependency of one variable, for example an incidence rate, on several other variables such as gender, age, diet, and so on. (The name arises from one of the first uses of the method. When studying the relation between the heights of fathers and sons it was found that the sons of fathers taller than the mean tended to be on average less tall than the fathers, i.e. their stature tended to `regress' towards the mean, the same regression occurring for sons of fathers shorter than the mean.)

Adjusting for confounders aims at eliminating the error that can arise by attributing to one factor, overweight, a difference in rates that may in fact be due to one or more other factors (the confounders). This is, however, only one source of possible error, two other main sources arising from the selection of people included in a study and from the methods of observing them. As seen, the Flower City cohort was composed of 3,000 male volunteers. If more overweight (but not normal weight) people of lower socio-economic classes had tended to volunteer for the study than overweight people of higher socio-economic classes, the higher rate of diabetes among the overweight subjects could reflect an effect of the less healthy diet of the lower classes rather than of obesity. This selection bias could go unrecognized and lead to a wrong interpretation of the study results if information on socio-economic conditions, which may not be simple to fully capture, would not have been collected on all subjects. An observation bias would, on the other hand, be introduced if, for example, overweight people had been kept under closer surveillance, because of their very condition, than normal-weight people. This may have made it more likely that new cases of diabetes would be detected among the overweight than among the normal-weight men. In sum, three types of potential distortions loom over any observational study in epidemiology: (a) bias from uncontrolled or inadequately controlled (through adjustment methods) confounders; (b) bias from selection of subjects; and (c) bias from observation of subjects and collection of information. Bias is synonymous with systematic or constant error, the most important and difficult to neutralize or at least to take into account in observational studies. In addition chance errors are always present.

**Ruling out chance**

For the Flower City cohort, we are told by the investigators that no other confounders than age and no selection and observation biases have proved relevant, hence only excess weight remains as a candidate for the observed association with the rates of diabetes. Yet how can one be reasonably sure that the difference between the age-adjusted incidence rates of diabetes (18.5 and 8.3) has not arisen purely by chance? Looking at the table of results, we see that in the age group 50-59 the number of person years among normal-weight and overweight people happens to be rather large and essentially the same (790 and 791). We can take advantage of this circumstance and argue that if weight had no effect we would expect that, the person-years being the same, the number of new cases of diabetes would also be the same among normal and overweight people in the age group 50-59. Instead, there are 9 cases among people of normal weight and 21 among overweight subjects, a rather large divergence with respect to the expectation of an equal number of (21 + 9)/2 =15. Even large divergences can, however, occur just by chance and the relevant question is: how often? We may exactly mimic our diabetes study by taking a coin, throwing it 30 times, noting the number of heads and tails, and repeating the experiment several times. The expectation is that there will be an equal number of heads and tails (15) in each experiment, as there should have been an equal number of diabetes cases (15) among normal and overweight people.

The divergence from this expectation in the successive experiments, each consisting of 30 throws, will tell how likely or unlikely is a chance deviation as large as the one observed (21 and 9). Here are the results of a small series of 20 experiments that I did, before becoming tired: One experiment out of 20 (5%) gave a result of 7 to 23 (in bold), a chance divergence from expectation larger than the one (21 to 9) observed in the data on diabetes. In 19 out of 20 experiments (95%), the divergence was instead less than 21 to 9. 95% is not 100%, but it is reasonably close to it, and on this basis we may be prepared to conclude that having observed in our study (which exactly mimics the head and tail experiments), 21 cases of diabetes among the overweight and 9 among the normal-weight people, the hypothesis that there is no real difference in rates can now be rejected. In statistical jargon, we have performed a significance test on the rate difference and we are prepared to say that `the observed difference is statistically significant at the 5% level'. This implies that if we keep to this way of proceeding on similar occasions we are bound to be wrong only in 5% of them. Unfortunately, nobody can tell whether this may be one of those five occasions when we may `reasonably' reach the wrong conclusion!

**Significance tests and confidence limits**

Carrying out head-and-tail experiments is a useful device to illustrate empirically the basis of a significance test, but exact calculations can be and are made every day based on probability theory. In the case of two alternative and mutually exclusive events, such as heads and tails, male and female, alive or dead, the binomial probability distribution permits exact calculations of how often an event that has a probability 1T of occurrence will in fact happen in trials, for example how often in families of n = 2 children there will be 0 boys and 2 girls, 1 boy and 1 girl, and 2 boys and 0 girls. We assume (although it may not be strictly so in nature) that the successive births are independent in respect to sex determination and that the probability IT of a male birth is 1/2, the same as the probability of a female birth. Hence the probability of 2 girls and 0 boys will be 1h x 1h = 1/4 or 0.25, which is the probability of 2 boys and 0 girls as well. Moreover, any of the combinations of 1 boy and 1 girl will have a probability of 1/2 x 1h = 1/4; as there are two possible combinations one with boy first girl second and the other with girl first, their total probability will be 2 x 1/4 = 1/2 or 0.5. The three possible offspring cover all possibilities hence their probabilities must add up to 1, as they in fact do: 0.25 + 0.5 + 0.25 = 1. In a similar way, probabilities can be derived for binomial distributions with other values of n and IT.

There is a probability of 4.2%, i.e. of less than 5% (routinely indicated as p < 0.05 or P < 0.05), that under the hypothesis called null hypothesis - of no difference in the incidence rate of diabetes between normal and overweight people - a result as extreme as 21 or more overweight cases would be observed. The same conclusion would have been reached if instead of focusing only on the people in the age bracket 50-59, we had performed our test of statistical significance on the age-adjusted rates of 18.5 and 8.3 that summarize the experience of the whole cohort of 3,000 men: their difference of 10.2 - the most relevant to test - turns out to be also statistically significant with probability p < 0.05. Although significance tests are very popular in science in general and in epidemiology in particular, there is a more informative and preferable way of arriving at the same result. The difference of 10.2, derived from a population sample of large but finite size (3,000 people), reflects the combination of the `true' but unknown difference in an ideal population of infinite size with the chance fluctuation arising from the fact that out of that ideal population we have studied a finite sample of 3,000 people. If we were to repeat our study on another sample of 3,000 people entirely indistinguishable from those in the first sample we would obtain a difference slightly different from 10.2 and the same would occur again for any successive sample. Once more, the binomial probability distribution permits the exploration of a range of values of the difference such that it has a probability of 95% (or if one prefers, 90% or 99%) of including the true unknown difference. For our case, these values are 2.0 and 18.3, a rather large range. We can summarize by saying that the point estimate of the true unknown difference is 10.2, with 95% confidence limits (or with a 95% confidence interval) of 2.0 and 18.3. In simple terms, the confidence interval expresses the range of values within which the true difference has a certain probability of being included. If there was no real difference, the range would include the value zero, i.e. it would, for instance, range from -3.2 to 11.5. The confidence interval is much more informative than a statistical significance test, and is therefore a much better way of assessing the role of chance. It not only tells us, like the significance test, that a difference is unlikely to have arisen by chance (if the null hypothesis were true), but provides information on the range of plausible values of that difference. Because the range does not provide a certainty but only specifies a probability that the true difference lies within it, it may be in error in the same way as a significance test. Computing 95% confidence limits and stating that the true difference lies between them will prove wrong on 5% of the occasions, but nobody can tell whether our diabetes study is one of these.

Calm down! No one can tell whether the wrong result is in yours or somebody else's study. And, please pay much more attention to sources of errors other than chance Higher levels of confidence can be adopted, for example 99% or 99.9%, entailing only 1% or a O.1% risk of being wrong, but the price paid for a higher degree of confidence is that the interval within which the difference can be stated to lie becomes larger.

**A guide to interpreting associations**

The problem of interpreting well-established associations came to a critical pass in the early 1960s, when a number of epidemiological studies had been accumulating that seriously indicted tobacco smoking as the culprit of several diseases, notably lung cancer. Up to that time, the so-called 'Koch's postulates' had been used as a common yardstick to evaluate associations between exposure and disease. Robert Koch, a key figure in the microbiological revolution of medicine, had discovered the bacteria causing tuberculosis and cholera, and formulated his criteria in 1890 to tackle the question: how can we distinguish, out of the thousands of micro-organisms hosted by any human body, the minority capable of producing disease from the great majority of innocuous parasites? In Koch's criteria, the decisive element permitting the interpretation of the association of a micro-organism with a human disease as causal was the laboratory reproduction of the disease in some experimental animal. When applied to the smoking/lung cancer issue, this criterion represented an insurmountable obstacle as no one had yet succeeded in inducing lung cancer by forcing animals to inhale tobacco smoke.

In 1962, a report of the Royal College of Physicians of the United Kingdom strongly endorsed the view that tobacco smoking causes lung cancer, but it was only with the 1964 report `Smoking and Health' commissioned by the United States Surgeon General (head of the Public Health Service) to a panel of ten scientists that the issue of criteria for establishing causality was explicitly discussed. The report, produced after an in-depth examination of all the available evidence and the consultation of about 200 experts, stands as a masterpiece in the evaluation of scientific, and in particular epidemiological, evidence. The scientists enunciated and applied a number of principles to assess the meaning, causal or non-causal, of associations. At about the same time, Austin Bradford Hill from the London School of Hygiene and Tropical Medicine independently outlined similar principles in a profound and terse paper, stressing that they should be employed not as criteria to be invariably fulfilled, but rather as a guide in forming judgments of causality. These principles, either in their original forms or in one of the several subsequent variants, remain a suitable frame of reference to interpret exposure-disease associations.

In my own variant for this book, the guidelines consist of eight questions: I Did the exposure precede the disease? For example, was the past diet as reported by patients with colon cancer antecedent to the cancer, or were they in fact and inadvertently providing information on diets already modified because of minor symptoms of a silently developing cancer? Only diet prior to cancer onset can act as a cause or as a protective factor and, unless unequivocal information on this point is acquired, no conclusion about the nature of the diet-colon cancer association can be made.

How strong is the association? For the Flower City cohort we estimated the incidence rate of type 2 diabetes in overweight people as 18.5 per 1,000 person-years and in normal weight people as 8.3, a difference of 10.2, with a 95% probability that the real difference would be between 2.0 and 18.3.

As an alternative to this rate difference, we can compute a rate ratio of 18.5/8.3 = 2.2, for which the 95% confidence limits turn out to be 1.2 and 5.4. The relative rate, or equivalently the ratio of risks (risk ratio, relative risk), is a much preferable tool for assessing the strength of an association than the risk difference, as errors from a variety of sources tend to be proportional to the rates and their possible role in producing an observed association is much better gauged by the ratio than by the difference. The same rate difference of 18.5 - 8.3 = 10.2 per 1,000 person-years found in Flower City could hypothetically derive from two other rates, say 120.5 and 110.3. However, the first association implies a rate increase of (10.2/8.2) x 100 = 124%, while for the second, the increase is only (10.2/110.3) x 100 = 9%. The latter association maybe accounted for easily by a 10% error, an amount not uncommon in epidemiological studies due to uncontrollable factors, while the former is much stronger as it greatly exceeds (by some 12 times) a 10% error. The best way of making the different strengths of the two associations immediately visible is to express them not through the rate difference (the same for both) but through the rate ratio (respectively 2.2 and 1.09). In general, the stronger a rate ratio or a risk ratio, the more confident one can be that it is unlikely to be due to errors.

There is, however, no line, fixed for all studies, separating `weak' and `strong' rate ratios, as the amount of errors that can creep into different studies depends on their type, method of measurement employed, and population recruited.

Does the association become stronger with increasing exposure? It is reasonable to expect that if an exposure causes a disease, the incidence rate will rise with increasing levels of exposure. For example, the rate of lung cancer increases with the number of cigarettes smoked daily and with the number of years of smoking, two different aspects of the magnitude of the exposure.

Is the association consistent? Again, it is reasonable to expect that if an exposure causes a disease it will manifest this effect consistently, if not in exactly the same way, in different subgroups of people, i.e. males and females, urban and rural dwellers, and so on.

Is the association specific? A strong association specific for a particular disease speaks in favour of a causal effect via a specific biological mechanism, whereas multiple weak associations with disparate diseases raise the suspicion that they may be an artefact due to some bias affecting the ensemble of a study.

Is the association consistent with other biological evidence? In the case of lung cancer, experiments to reproduce the disease by having animals inhale tobacco smoke failed for a long time. However, extracts of the smoke were repeatedly shown to cause cancer when painted on the skin of laboratory animals. This type of indirect evidence was rightly regarded as supporting the ea that tobacco smoking is capable of producing cancer. For the association between overweight and incidence of diabetes, there are biological mechanisms, particularly the fact that an excess of body fat interferes with the action of insulin, supporting the contention that overweight is a cause of diabetes.

Has the association any analogue? This may be the case, for example, when the exposure under study is the molecule of a chemical pollutant with a structure analogous to the molecule of an already known carcinogen.

Is the association coherent across different studies? An association that is repeatedly found in epidemiological studies of different types and in different populations is much more likely to be causal than an association showing up occasionally. This interpretation is further supported if cessation of exposure, as when smokers give up the habit, is followed by a decrease of the associated disease.

Schematically it can be stated that a positive answer to question number 1 is a must for an exposure-disease association to be judged causal; that a positive answer to question number 8 offers the strongest support to this judgement; and that a positive answer to each of questions 2 to 7 increases the likelihood that an association is causal.

If at this point you feel that the process of establishing an exposure-disease association and of judging its nature, causal or non-causal, is laborious and hardly simple, you are right; it is rigorous as well. You will also have realized how futile is the comment - frequently put forward to disqualify epidemiological investigations of environmental or other hazards – that epidemiological studies produce only `soft' or `statistical' evidence. What they produce is just scientific evidence, no more and no less than any other kind of correctly conducted scientific study.