Get Adobe Flash player

Main Menu

(7) Epidemiology


National Cancer Institute - Cancer Epidemiology Matters Blog


Today's diseases arise from yesterday's causes

The sampling of a limited, but not too tiny, number of subjects out of a much larger cohort or population can be looked at from a different viewpoint. Ignoring the cohort for a moment, attention can be focused on the disease cases as the starting point of an investigation. This is what happens every day to doctors confronted with their patients. Again and again, keen doctors have been struck by the unusual occurrence of some events in the life experience of some of their patients providing the first hints to causes of the disease. For example, in the early 1960s an ear, nose, and throat specialist noted that as many as one-quarter of patients with cancer of the nasal cavities, a very rare disease, occurred in furniture workers, an infrequent type of exposure in the general population. This observation paved the way to subsequent epidemiological studies which showed that dusts produced in furniture and cabinet works can produce cancers of the nasal cavities, probably because the dust is loaded with several carcinogenic chemicals. Cancer of the nasal cavities and furniture making are both so rare that their repeated joint occurrence is unlikely to arise by chance. In ordinary circumstances, however, to judge whether such a joint occurrence is just coincidental requires some estimate of the frequency of the possible causal factor among the patients and in the general population from which they originate. Observing, as happened in the brief period of three years at a hospital in Boston, seven cases of cancer of the vagina in women as young as 15 to 22 immediately prompted an enquiry into the life experience of the patients extending into the pre-natal intrauterine period. Rather than focusing only on the patients, the investigators carefully selected for each case four controls: women born within five calendar days and in the same service (ward or private). Examination of the medical history of the patients' mothers during pregnancy found that all the mothers of the cases and none of the controls had been taking diethylstilbestrol, a synthetic oestrogen prescribed to prevent pregnancy loss in high-risk women. This case-control study provided strong evidence of a causal association between the drug and the cancer in daughters, explained on the basis of the alteration that it induced in the vaginal cells of the foetus that years later developed into a cancer. The use of the drug has since been proscribed.

Case-control studies were in fact widespread in epidemiology well before their use in the special situation where cases and controls are extracted from an actual study cohort. They can be regarded as a natural expansion of the enquiry a doctor makes the first time he or she sees a patient, asking not only about symptoms but also about the patient's health history, familial precedents, eating habits, occupation, and other elements which may possibly have a bearing on the present condition. In a case-control study, this procedure is carried out in a formalized way, using questionnaires focused on the exposure of interest to the investigator, and extending the enquiry to control subjects as well. The great advantage of this type of study is that it capitalizes on cases of disease which are occurring currently as a result of past causes - to be identified - rather than requiring, like a cohort study, a follow-up of subjects lasting years, waiting for the cases to occur.

Case-control studies have contributed knowledge to all areas of epidemiology and medicine. One example whose full relevance is now tangible are the seminal case-control studies tracking the causes of cervical cancer, today the second commonest cancer in women in developing countries.

It had already been noted in the 19th century that this cancer was uncommon among nuns, suggesting that it was perhaps in some way connected with sexual activity. It then emerged from several case-control studies in the second half of the 20th century that the cancer was related to being married, particularly at an early age and to a high frequency of sexual intercourse. In one study, a frequency of intercourse of 15 times or more a month was 50% higher among the cancer cases than among the controls. In these studies,

information on exposure (i.e. on frequency of intercourse) was collected, as very often in case-control studies, by interview and it could have been inaccurate; moreover, it was most likely that marriage and sexual intercourse were not directly relevant but reflected the action of some other unknown factor, probably infectious. The search for sexually transmitted micro-organisms started focusing in particular on several viruses, among which the human papilloma viruses (HPVs) were particularly suspect because they were known to produce benign tumour lesions in humans (warts) and malignant tumours in rabbits. Case-control studies in which the exposure was no longer the marital status nor the frequency of intercourse but the presence of the virus showed a strong association of some types of HPV with the cancer.

The more specific and accurate was the laboratory method to ascertain the presence of the virus in cells of the uterine cervix, the stronger the association turned out to be, indicating that the virus, and not something else, was the real factor at play. Would this also mean that it was the cause of the cancer? It could in fact be that the cancer developed first and the virus was found only as a host boarding the cells once the cancer had begun. A case-control study is not a good instrument to solve this kind of `who's first' question, because the presence of the virus (and in general of an exposure) is ascertained at the moment the disease is already established. Cohort studies showed that the infection with the virus preceded the cancer. Moreover, studies with newly developed vaccines demonstrated in a definitive way that the HPV viruses are the cause of cervical cancer and that blocking them prevents occurrence of the disease. Vaccination campaigns in young women are now in progress in several countries. Epidemiology has brought a crucial contribution to this major advance in public health, and case-control studies have been a pioneering component of it.

The four key features of case-control studies

1. The selection of cases is the starting point of case-control studies. Often, they are observed in hospital and the diagnosis can be accurate and if necessary refined, for example separating the different cellular types of lung cancer if one suspects that they may be influenced by different factors (exposures) to be investigated. Usually cases should have arisen very recently, i.e. they should be new or incident cases, for example of diabetes. If all cases of diabetes, whether they were diagnosed yesterday or ten years ago, are instead included in the study, it may happen that a factor emerging as different between cases and controls does in fact influence how long a diabetes patient survives rather than why a healthy subject becomes diabetic. These two features become inextricable and the results of the study will become hard to interpret.

2. The selection of controls obeys the fundamental and rather obvious principle that they should come from the same study population as the cases. There are usually no problems when the population is an existing cohort already under investigation, as we have seen when discussing the case-control study within a cohort. A similar situation holds when the cases are, for example, all stomach cancers recorded in a year by a cancer registry covering a defined population, and controls are picked up at random from the population.

The hurdle is that there will always be a proportion of selected controls who refuse to participate; they can be replaced by other people who consent to participate but in this way the controls are no longer rigorously representative of the population from which the cases come. When the latter is only vaguely defined, as when the cases are patients in a hospital, the problem of which population to sample to obtain controls may become very difficult. Taking as controls patients in the same hospital with diseases other than the one under study and not related to the factors under study is a widely adopted solution. It assumes that all kinds of patients reach the hospital for the same combination f reasons, medical, personal, administrative, or legal. This assumption may often be wrong as when, for example, the hospital has one highly specialized and reputed service for leukaemia, the disease under study, which receives patients from several regions while the other services of the hospital operate essentially on a local basis. In this situation, it is reasonable to select controls coming from the same area of residence of each case and it may be sensible to also match cases and controls for gender, age, interviewer, and calendar period of the interview. Going further and trying to find controls similar to the cases in other respects should be avoided. Not only is it difficult to find controls that match a case when the number of characteristics increases, but making cases and controls more and more similar makes the controls unrepresentative of their population of origin and destroys the possibility of discovering differences in exposures between cases and controls, i.e. the very purpose of the study. The choice of controls is a major challenge for epidemiologists, requiring both experiences - including mistakes - and specific knowledge of the local context of the study.

3. Ascertaining exposures very often involves interviewing cases and controls about a variety of factors to which they may have been exposed, ranging from smoking habits through diet to medical history, depending on the purpose of the study. The same interviewer interviews a case and his or her controls, in a random order and within a short period of time, to avoid subtle changes in the way questions are asked that may intervene with the passing of time (a case-control study usually lasts for months or a few years as necessary to obtain the required number of cases and controls). Structured questionnaires are the rule for the interview and interviewers undergo training sessions on how to use them and, more generally, on the approach to the subjects. Ideally the interviewers should not know whether the person they are talking to is a case or a control, as this would avoid bias in the way questions are formulated and answers recorded. This `blind' condition is, however, seldom feasible in practice. In addition, the subjects themselves may remember incorrectly or report, consciously or unconsciously, past events and exposures. The extent to which this misreporting may be different for cases and controls produces a recall bias that distorts comparison. Similar problems affect telephone interviews and replies to self-administered questionnaires. Lesser difficulties arise when past exposures can be evaluated consulting written documents, for instance medical or employment records, although they may sometimes be incompletely or inaccurately filled in. Finally, an investigator may wish to explore the influence of a physiological factor like insulin on a disease such as colon cancer by measuring the blood levels of insulin in cases with colon cancer and controls without the disease; but who can guarantee that it is insulin influencing the disease rather than the other way round? Clearly ascertaining exposure is a delicate exercise in a case-control setting.

4. By now you should have noted the basic difference between a case-control and a prospective study. The prospective study observes events in their natural course from causes to possible effects. Computing and comparing incidence rates or risks of chronic bronchitis in smokers and non-smokers seeks to answer the question: how often do smokers develop the disease compared to non-smokers? A case-control study observes the events in a reverse sequence, from effects to possible causes. It starts from the disease and seeks to answer the question: what proportion of people with chronic bronchitis have been smokers compared to people with no disease? No incidence rates or risks can be calculated from a case-control study as the number of smokers and non-smokers at risk of developing the disease is as a rule unknown; we only have two samples of people who actually developed or did not develop the disease but we know the frequency of smoking in both samples. Fortunately, a proper data analysis permits us to compute the ratio of the two risks, each of them remaining unknown. If this sounds surprising, consider for a moment the figures from a prospective study (not a case-control!) of a population of 10,000 people, of whom 2,515 turn out to be smokers and 7,485 non-smokers:

Smokers – Non - smokers

Developed chronic bronchitis after three years of observation 25-15

Did not develop chronic bronchitis 2,490 – 7,740

Total population (10,000) 2,515 – 7,485

In three years, 25 smokers out of 2,515 developed chronic bronchitis, hence their risk is 25/2,515. Similarly, the risk for non-smokers is 15/7,485. The ratio of the two risks is (25/2,515) / (15/7,485) = (25/2,515) x (7,485/15) = 4.9, i.e. smokers have almost a fivefold probability of developing chronic bronchitis. We could get nearly the same result by replacing 2,515 (the number of smokers initially at risk of disease) with 2,490, the number that did not actually develop the disease by the end of the three years of observation. This replacement is justified by 2,490 being a reasonably close approximation to 2,515 and, similarly, 7,470 to 7,485. In general, the smaller the number of diseased people in relation to the population size, i.e. the disease risk during the period of observation, the better the approximation will be. And as any long period can be broken down into very tiny intervals, it will in principle be possible to make the risk within each interval as small as we please, rendering the approximation virtually perfect (a device you may come across under the intimidating name of `incidence density sampling').

The new ratio, called the odds ratio, can now be computed as: (25/2,490)/(15/7,470) = (25/2,490) x (7,470/15) = 5.0, very close to 4.9.

Why go to the trouble of computing an odds ratio when the risk ratio is already available? Because the latter can, unlike the risk ratio, be calculated not only in a prospective study - as in the example - but also in a case-control study. For instance, a case-control study covering the same time span as our prospective study may have picked up from our population all 40 cases of bronchitis through hospital records and at random 160 controls without the disease, i.e. only 1.6 % of the 2,490 + 7,470 subjects with no disease. The new figures look like this:


Cases with chronic bronchitis 25-15

Controls 40-120

The odds ratio is (25/40) / (15/120) = (25/40) x (120/15) = 5.0, exactly the same as before.

Herein lies the remarkable advantage of a case-control study: the possibility of estimating via the odds ratio computed from a comparatively small number of subjects the same ratio of risks that in a prospective setting would require following up a large population for years. This advantage offsets the limitations already discussed (notably in the choice of controls and in ascertainment of exposure) of case-control studies and explains their continuing popularity with epidemiologists. Problems notwithstanding, the case-control study is an epidemiological tool adaptable to all manners of circumstances and relatively rapid to implement. As such, it has been popular and is still currently used widely as a first-line study, when tackling a new health problem. When a group of people comes down with a serious gastrointestinal ailment after a festive dinner, the first thing an epidemiologist will do is to interview the sick people and then some healthy controls to ascertain the frequency with which individual food items served at the dinner were consumed by cases and controls. Hazardous items may in this way emerge and be identified and, hopefully, removed from the menu.




googleplus sm


ar bg ca zh-chs zh-cht cs da nl en et fi fr de el ht he hi hu id it ja ko lv lt no pl pt ro ru sk sl es sv th tr uk

Verse of the Day

Global Map