Cancer

3D Medical Animation – What is Cancer?

AMERICAN CANCER SOCIETY

NATIONAL CANCER INSTITUTE

Cancer5

Drugs used for symptom control

Although not directly treating the cancer, a range of supportive care drugs have contributed to big improvements in cancer treatments over the last 10 to 15 years. The improved anti-sickness drugs have already been mentioned. Also related to chemotherapy safety and delivery are the growth factors, in particular granulocyte colony stimulating factor (G-CSF) which boosts white blood cell counts, reducing infection risks. A second related product called GM-CSF (granulocyte-macrophage colony stimulating factor), initially developed for the same purpose, has turned out to have a valuable role in releasing blood cell precursors called stem cells into the circulation. This somewhat esoteric observation has allowed the harvesting of stem cells prior to high-dose chemotherapy intended to destroy the normal bone marrow. Previously patients needed a bone marrow transplant to ‘rescue’ them from such treatment, but it turns out that harvested stem cells do the same job but more quickly and with a much easier pre-treatment harvesting procedure, extending the range of patients suitable for these high-dose therapies.

Another area of recent research has been bone-protecting agents. Many cancers spread into bone with devastating consequences, including pain, fracture, and paralysis due to spinal column damage. Research demonstrated that the body ‘over-reacting’ to the cancer led, paradoxically, to increased damage. Drugs initially developed for osteoporosis (bone thinning) turned out to reduce this collateral, self-inflicted damage. The initial drugs available, such as clodronate, were relatively low in potency but later drugs, such as zoledronate and ibandronate, are many times more effective and can substantially reduce bone damage in patients with advanced cancer. Even more intriguingly, in adjuvant trials in high-risk breast cancer, zoledronate also appeared to reduce soft tissue disease, suggesting these agents may in addition have direct anticancer properties.

The improvements in cancer treatment seen in the last 100 years have been dramatic and have transformed the outcomes for millions of people across the world. Cancer treatment in the early 21st century is safer, more effective, and less toxic than it was 50 or 100 years ago. Surgery and radiotherapy continue to be refined and improved, with better targeting and minimal access technologies increasingly available. The ancillary imaging and pathology services will also continue to improve and allow better selection of treatment options in the future. The range of drugs and the effectiveness of those drugs are increasing rapidly, and this will generate further improvements in the coming years. The main problem with all this, is the escalating cost, but grappling with this issue is better than not having the options available.

Cancer research

As we have already seen, the mainstays of cancer therapy remain surgery and radiotherapy, both of which date from the 19th century but which have undergone a process of continual technical improvement, which is still ongoing. Drug treatments for cancer are comparatively much more recent.

The first successful cancer drug therapy was the use of synthetic female hormones to treat prostate cancer in the 1940s. Successful curative chemotherapy really dates from the 1970s with the development of treatments for leukaemias and lymphomas (cancers of the bone marrow and lymphatic system), although interestingly, the chemicals on which these treatments are based were previously developed for more nefarious purposes (as we have seen, mustine, one of the first successful drugs in this area, is based on the active ingredient of mustard gas). The development of new treatments and the improvement of existing ones clearly require a process of research. This text will describe some of the ways in which research happens, in particular the differences between the rules for drugs and those for devices (such as radiotherapy machines) or techniques (surgery). These contrasts will be explored in some detail, as there are important differences, with significant anomalies resulting. The text will focus mostly on where new treatments come from, but similar trial structures apply to testing existing treatments against each other or for research into techniques of symptom control.

The development process for new surgical and radiotherapy techniques differs significantly from that applying to drugs. Typically, a surgical improvement will be a small technical change (for example, a better way to control bleeding) that does not fundamentally conceptually alter the underlying technique. Such improvements are often licensed essentially on a ‘fitness for purpose’ basis (that is, does it really help control bleeding?). Similar arguments apply to technical radiotherapy improvements (for example, better ways of targeting radiation to spare normal tissues). In general, it has been taken as self-evident that improvements of these sorts must be better and their implementation will follow. In fact, the improvements may be illusory and commercial pressure rather than any sound evidence base may drive their implementation. I will illustrate how and why this may arise using robotic surgical techniques and intensity-modulated radiotherapy as examples.

Drug treatments, on the other hand, have to meet fundamentally different criteria. Generally, an improvement in survival rates compared to the previous standard of care is required by regulatory authorities such as the Food and Drug Administration (FDA) in the USA. This means a new drug treatment requires testing in a series of clinical trials involving large numbers of patients. Broadly, these can be divided into three categories termed phases 1 to 3.

Phase 1 trials establish the safety and side-effect profile of a drug. Typically, these will involve small numbers of patients, for cancer drugs usually those who have run out of standard options and who will have had multiple previous treatments. Drugs with less dramatic effects, for example blood pressure drugs, will often be tested first on healthy volunteers.

Phase 2 trials are larger and will often involve patients earlier in the ‘cancer journey’ than phase 1 studies, and they aim to confirm that a drug has useful activity against the target cancer. For a drug that looks promising, the final phase 3 trial will compare it with whatever is considered the standard of care.

A phase 3 trial will involve many hundreds or even thousands of patients. There are a range of problems inherent in this design, ranging from consent and cost to legislative burdens. Phase 3 licensing trials are now almost always international affairs and have to comply with legislative frameworks from multiple countries, in particular the USA. The costs of such trials are enormous and explain the very high costs of new drugs – around $1 billion from synthesis to registration of a new cancer drug. The licensing process – which gives a company the lucrative right to market a drug or product – is tightly regulated by national or transnational bodies such as the FDA. This theme of regulation will be developed further in the next text – arguably, the high level of trial regulation protects the individual participant in a trial from possible harm at the expense of society at large by slowing the pace of improvement and driving up the costs of new drugs to the point when access is increasingly restricted, even in the most wealthy of economies.

Developing new cancer drugs

Basic science

Clearly, a massive body of biological research underpins cancer research. There have been huge advances made in the last 50 years, particularly the unraveling of the structure of DNA and the so-called ‘central dogma’ of biology – the relationship between DNA, RNA, and protein. Previous generations of cancer drugs were developed largely by observing the effects of chemicals on cells, looking for drugs that were particularly effective at killing cancer cells. This research produced the chemotherapy drugs that appeared in large numbers in the 1970s and 1980s. Although new chemotherapy agents are still being produced, there is a sense of diminishing returns from more recent drugs compared to the huge advances of previous decades.

More recent research has focused on the evolving knowledge of the molecular signatures of cancer discussed in the previous chapter in relation to targeted small molecules and monoclonal antibodies. The human genome was sequenced in the late 20th century. The initial sequencing technology was cumbersome and slow, and the first complete sequence took many years to complete. Having completed this task, and with the overall structure of the human genome now known, it has become possible to sequence the genomes of specific cancers and to compare the cancer DNA to the patient’s normal DNA extracted from their blood cells. This now takes teams in specialized laboratories a few weeks and costs are falling rapidly. The technology, time required, and costs are likely to improve dramatically over the next few years such that it will soon be possible to individually determine the DNA sequence of each patient’s cancer as part of the diagnostic work-up. For the time being, this work is experimental, and remarkable results are emerging from this new field of study.

The human cell contains around 21,000 genes arranged in 23 chromosomes. Research comparing the DNA sequence of the entire 21,000 genes with the normal DNA of the patients has now been done for a number of cancers, and the results illustrate how small the line is between normal and cancerous cells. On average, experiments of this sort reveal abnormalities in around 40 to 60 genes. Put another way, if we picture the human genome as a library of 23 books (the chromosomes) each of around 1,000 pages (genes), there will be a total of 40 to 60 typographic errors in the entire cancer cell version of the ‘library’. Furthermore, many of these genetic ‘typos’ will not actually alter the ‘sense’ of the gene – the protein produced will retain normal function. The number of key drivers of the cancer process boils down to around 12 pathways. The genes mutated or misfiring in cancers studied in this way all belong to one of these pathways and appear to be present in all cancers studied. This work points the way to the next stage of cancer drug development. The recent round of small molecules and monoclonals have largely (but not entirely) focused on single molecules such as HER2 being targeted by Herceptin. This recent whole genome work highlights the need to target pathways of multiple genes rather than single members. Drug screens in the future are likely to focus on this aspect of cancer biology, in tandem with whole genome screens to pinpoint the key mutated genes in particular cancers. It also opens the possibility that the drugs of tomorrow will be known to work in the presence of particular genetic signatures. Therefore, linking whole genome sequencing to diagnosis points the way to one of the ‘holy grails’ of cancer medicine – the personalized selection of drug therapies.

Pre-clinical phase

The first step in the development of new drugs is the identification of suitable compounds for study in human beings. Increasingly, this result from the sort of research work on cancer pathways described above. This search at present can take many forms, from screening of random compounds to the targeted synthesis of drugs to hit pre-specified abnormalities in the cancer cell. The drugs currently used in the clinic come from a range of sources, and some of these have been described in the previous chapter. The initial testing of a candidate drug will involve experiments with cancer cells in the laboratory. These cancer cells come from a variety of sources, ranging from human cancers to artificial tumours generated in laboratory animals. Some of the human cell lines were grown by taking fragments of a surgically removed cancer and placing it in cell culture medium in the laboratory. The process is conceptually attractive – you can test your drug on the ‘real’ cancer. There is many such cell lines, possibly the most famous is the HeLa cell line. This was grown from a fragment of cervical cancer taken from a woman called Henrietta Lacks (also sometimes referred to as Helen Lane or Helen Larson in an attempt early on to preserve her privacy), and the cells are very widely used in laboratories around the world. Parenthetically, neither she nor her family gave their consent or permission for this process, resulting in a famous court case in California in 1990 in which it was decided that, in the USA, such a process was lawful. In the UK and other countries, the position is different and informed patient consent for tissue collection is now enforced by legislation. It has been calculated that so many HeLa cells have been cultured that they outnumber many times over the ‘normal’ cells produced by Ms Lacks in her lifetime, giving her a curious form of immortality. The problem with cell lines, however, is that most attempts to grow tumour cells from patients are unsuccessful. Hence the cell lines we have may be as unrepresentative of the typical cancer as HeLa cells are of the person that was Henrietta Lacks. Nonetheless, despite this limitation, human cancer cell lines remain a key component of cancer research and drug testing.

The second form of cell lines used is derived from animal tumours, mostly arising in mice. Many of these tumours are artificially engineered. A good example of this is an engineered cell line used in prostate cancer research. Mice do not get prostate cancer in the way that humans do. However, it is possible to identify genes that are expressed in mouse prostate and to use the promoter regions of those genes to drive the production of proteins that cause cancer. In the case of mice, a gene with the curious name of ‘large-T’ from a cancer-causing virus called SV40 is used.

Parenthetically, while many genes have names that are strings of unmemorable letters and numbers (there are 21,000 human genes alone after all – a lot to name), a subset have names varying from the odd (large-T) to the odder (hedgehog, notch less) to downright amusing – a pair of genes involved in cell signaling are called ‘mad’ and ‘Max’!

In order to have mice develop prostate tumours, the hybrid gene containing the prostate specific gene promoter and the SV40-T gene must be inserted into a fertilized mouse egg. If the insertion is successful, a transgenic mouse results and the growing mice will now express the foreign gene in their prostate glands. As would be predicted, these mice go on to develop multiple prostate tumours. A number of these cancer-prone mice were bred, and the strain is called the Transgenic

Adenocarcinoma of the Mouse Prostate (TRAMP) model. These mice have proved useful in a number of ways. As the mice reliably develop tumours, they can be used to test cancer-preventing strategies such as dietary interventions. Secondly, the tumours can be used to test drug treatments for effectiveness. Thirdly, tumour cells arising in TRAMP mice have been successfully cultured in the lab and these cell lines can be used for experiments, either alone or re-implanted into adult animals from the same mouse strain – quicker and more reproducible than waiting for the tumours to develop in the TRAMP mice themselves. It is again obvious from the above discussions that such models are only representations of aspects of the human disease, not perfect replications of it. Hence, while useful, drugs must ultimately be tested in humans.

Before a drug can be administered to human subjects, a further phase of pre-clinical testing is required – toxicity testing. While animal models and cell culture provide valuable indications of whether a drug may be active in man, they do not tell us whether it is safe. We also need to know whether it is likely that we can achieve drug levels in patients that will be high enough to realistically have an impact on the cancer. The standard way of exploring this is to give escalating drug doses to groups of animals until we start to see animals dying from drug side effects. There are a number of rather grisly standard measures, such as the dose of drug that will kill a proportion of the test subjects – termed the lethal dose (LD) test. Measures such as LD50 (the dose that kills 50% of the animals) and LD10 (10% death rate) are widely used and attract much controversy from anti-vivisection groups. I don’t propose to examine the ethics of animal testing per se – it seems to me to be something you believe is right or believe is wrong. If you fall in the latter category, then no amount of argument will generally alter opinions. I do believe it is worth critically examining the scientific basis of animal testing to try and minimize unnecessary suffering. There are many very obvious problems with LD50 testing – for example, the LD50 will vary widely for different species for a given compound and hence may still expose human subjects to risks. Nonetheless, compounds that turn out to be very toxic in LD50 tests at levels well below the necessary therapeutic levels are unlikely to be safe or worthwhile to test in humans. Whatever the rights and wrongs and limitations of pre-clinical toxicity testing, at present regulatory authorities require such testing on at least two species, one of which must be a non-rodent species such as the dog, before any human testing of a drug can begin.

Phase 1 trials

Having produced a candidate drug and completed the necessary pre-clinical testing package, the next step is testing in human subjects. Logically enough, this is termed a phase 1 trial. For many drugs, for example blood pressure pills, this testing will take place in ‘normal’, usually paid, volunteers. In general, these will be fit young men (not women, due to the risk of inadvertent damage to a foetus). For cancer drugs, which are often very toxic and frequently carcinogenic, this is clearly not an appropriate route, and phase 1 trials usually take place in patients who have exhausted standard treatment options. The classical phase 1 trial format is that the initial three patients are treated at a conservatively low dose and the effects observed. If no unacceptable toxicity occurs, then a further three patients will be treated at a higher dose, and so on. Clearly, for most drugs, eventually a dose level will be reached at which unacceptable side effects occur (termed ‘dose-limiting toxicity’, or DLT). If a patient experiences a DLT, additional patients are treated at the same dose level. If two or more out of six experience a DLT, then the ‘maximum tolerated dose’ (MTD) for the drug is reached and the trial ends. The dose level below the MTD will be used for further study. The classical phase 1 trial has the merit of simplicity, but there are clearly limitations as well. Firstly, different patients will have varying susceptibility to potentially dose-limiting side effects. If the trial includes too many side-effect-prone patients, the estimated maximum tolerated dose will be too low, and vice versa. Secondly, not all drugs need to be used at the maximum tolerated dose. For example, a drug blocking a hormone receptor only needs to be given in sufficient quantity to block the target. Any additional drug given above this level is only adding toxicity with no benefit. For trials with drugs of this sort, it is therefore important to specify the endpoint required to avoid unnecessary drug exposure to participants.

The main problem with phase 1 trials relates to the needs of the patients. Mostly, these studies are happening in patients who have exhausted all standard therapy options and who are clearly desperate for further viable therapies. By its very nature, the phase 1 trial is mostly delivering drug below the likely therapeutic range with a consequent low chance of benefit. Furthermore, at least two of the last six patients entered in a study will receive too high a dose and will experience a high level of side effects. Finally, most drugs entering phase 1 will actually turn out to be of little therapeutic value due to either unforeseen problems preventing delivery of sufficient drug or simply a lack of efficacy against the target cancers. For most patients, therefore, entering a phase 1 trial needs to be seen largely as an act of altruism, and it is indeed true that many patients entering trials will say things like ‘well if it helps people after me, it will be worth it’. Nonetheless, ethics committees and doctors must be careful to protect vulnerable and desperate patients from harm in these trials.

Phase 2

If an agent performs well in phase 1 – in other words, side effects are manageable and acceptable, usually with some evidence of a positive effect on the cancer, then a phase 2 trial will follow. The aim of phase 2 studies is to study the efficacy of the drug in more detail. The drug will be tested at the optimal dose defined in phase 1 in a group of patients assessed as likely to benefit from the drug. This is clearly different to phase 1 as the risk of under- or overdosing is much reduced, though it still remains, due to the limitations of the dose-finding mechanisms in phase 1 discussed above.

Furthermore, as the patients are selected on the basis of likely benefit, the risk/benefit ratio for participants is much better. Typically, up to 40 or 50 patients will enter a phase 2 trial, and the endpoints will be efficacy, and of course safety, in the more defined, usually somewhat fitter, patient population.

Defining efficacy is a major problem. Generally, agents that produce tumour shrinkage are defined as active, and this has led to standardized ways of defining how much shrinkage constitutes a worthwhile response. The most widely used method is the RECIST (Response Evaluation Criteria In Solid Tumors) system, first published in 2000 and updated in January 2009.

  • Disease responses are broadly classified as follows:
  • Complete response: all assessable disease disappeared;
  • Partial response: reduction in size by pre-specified amount of all assessable disease;
  • Stable disease: insufficient change to be put in another category;
  • Progressive disease: worsening of disease by pre-specified amount or appearance of new cancer deposits.
  • The principle underlying this system of assessment is simple; the application in practice is complex.
  • As with many things, the devil is in the detail – the following is a list of tricky issues (not comprehensive) to illustrate the difficulties:
  • How much should a tumour grow before it counts as progression?
  • How much should it shrink to count as a response to treatment?
  • What if some lumps shrink but not others?
  • When should you carry out the response measurements (too early and you may under-report; too late and patients may have started relapsing)?
  • How do you assess tumour deposits in tissues such as bone or pleura (the lining around the lung) where there is no discrete lump that can be measured?

This last point is a particular problem with certain diseases such as prostate cancer that mainly affect bone. Therefore, while response to treatment remains an important test of drug activity, a second set of measures based on how long a patient takes to start getting worse – termed the ‘time to progression’ – is increasingly used. This has proved particularly important with the new targeted molecular therapies for diseases like renal cancer. With this disease, large masses often shrink but by less than the standard RECIST criteria. On review of the scans in these patients, it became obvious that the tumours changed in appearance, with the centre appearing to be less ‘active’ than before – borne out when lumps were removed and found to have dead tissue in the middle. In parallel, tumour related symptoms often improved. For these patients, therefore, prolonged ‘stable’ disease becomes a very worthwhile outcome. Improved time to progression is therefore frequently used as a means of assessing activity of an agent. Finally, of course, agents can be assessed for their effect on overall survival times. This is not frequently used in phase 2 as the principal outcome for a variety of reasons, mainly time – the aim is to establish as quickly as possible which agents to take forward for phase 3 licensing trials.

Phase 3 trials

If an agent shows encouraging activity in phase 2 with acceptable toxicity, it will then proceed into phase 3 trials in which the agent is compared to the current standard of care. Where the agent is a new drug, this will generally involve the drug company discussing the trial with the regulatory organizations such as the UK Medicines and Healthcare Regulatory Agency (MHRA), European Medicines Agency (EMA), and the US Food and Drug Administration (FDA). These bodies will have an opinion as to the appropriate comparator treatment and also the outcome required to obtain a licence. The comparator may be an existing drug or combination of drugs, or it may be what is termed ‘best supportive care’. This latter option is chosen when there is no clear-cut standard therapy – patients receive whichever palliative measures the clinician thinks appropriate.

The hallmark feature of phase 3 trials is that the patients are randomly assigned between the reatment options. This ensures that patients will be evenly distributed between the various arms of the trial and minimizes the risk of differences in outcomes arising due to patients with a better or worse prognosis being concentrated in one arm of the trial. Whilst the design makes good scientific sense and is regarded as the ‘gold standard’ method of assessment, as always there are limitations.

Firstly and most obviously, where the control arm is best supportive care or worse still a placebo medication, there is understandable reluctance on the part of patients. Careful explanation and support is clearly required, particularly to make the point that if there is no other proven alternative, then treatment outside the trial will be no different to the control arm. Often, however, a phase 3 trial is not comparing the new drug with placebo but with the current standard therapy. This is generally a much easier discussion in the clinic as everyone receives treatment and the new medicine may be less good than the old one – we don’t know until we do the trial. Even if the control is placebo, it is by no means a given that the new drug will turn out better – there are plenty of examples of trials in which the drug was no better than placebo, and even examples when the drug was worse – the drug was both toxic and ineffective.

Secondly, most new medicines will be only a little better than the existing ones, hence the likely differences between the trial arms will be small. In order to detect small differences, large sample sizes are necessary to ensure statistical confidence in the outcomes. Statistics is a much mocked, maligned, and misunderstood science, so it is helpful to illustrate why sample sizes need to be big with a simple example. Suppose we want to assess whether a coin used for a coin toss is evenly balanced or biased to either heads or tails. If we toss once, then we get either heads or tails (ignoring the possibility that the coin balances on its edge!). If we toss again and get the same, we have (say) 100% heads, 0% tails. No one would say the coin was biased on this size of sample, though. Suppose we carry on and get to 10 tosses – 6 heads, 4 tails – would we be confident that the coin was biased?

Probably not. However, if we get to 100 tosses with 60 heads and 40 tails, or 1,000 tosses with 600 heads and 400 tails, we would have increasing confidence that the coin was indeed biased. The reverse of the problem is more difficult: if we got 501 versus 499, would we say the coin was biased? Again, probably not, but how about 510 versus 490? 520 versus 480? How similar can the numbers be in order that the difference is probably by chance rather than due to a biased coin? Even a big difference like 600 versus 400 can occur by chance with an unbiased coin, but would be very unlikely. The statistics plan for a trial is therefore key and will specify how many patients will be needed to reliably detect the minimum difference deemed to be clinically important in advance of the trial starting. For a trial testing a new drug in advanced cancer, this will be along the lines of an average improvement in survival of at least three months. As with our coin flip, this could arise by chance so the trial statistician will calculate how many patients are needed to show (or exclude) this difference reliably – usually defined (largely arbitrarily) as the chance result occurring fewer than 1 in 20 times. For most modern trials, there will be a committee (usually called the Independent Data Monitoring Committee, IDMC, or Data and Safety Monitoring Committee, DSMC) set up to independently

monitor the results as they accrue. This is in place to protect patients primarily – if there are unforeseen toxicity problems, for example, the trial may be stopped early. Later on in the trial, the IDMC can end the study if the predefined endpoints are met early. This allows early dissemination of the data and allows other patients access to the drug earlier. Conversely, the IDMC can also determine that the trial is never likely to show significant differences and stop the trial early on grounds of futility.

Endpoints in trials are controversial. Trials are expensive, often $100 million plus, and hence drug companies want them to be as small and quick as possible. Conversely, regulators want the most reliable outcome measures and hence longer follow-up periods or larger sample sizes.

Society at large has needs somewhere in between. We all want better medicines, and if we’ve got cancer, we want them now. Equally, we want them to be safe. Also, the larger and longer the trial, the more the drug company has to charge for the drug in order to pay back the higher development costs for a more detailed discussion of this issue. As health budgets grow, so pressure to reduce drug costs rises, making availability of new drugs increasingly restricted for cancer patients in poorer economies. As a way out of these conflicting tensions, increasingly, researchers are looking for what are called ‘surrogate’ endpoints. The aim is to pick an early endpoint that will accurately predict the final outcome of the trial. The response rate in a phase 2 trial is an example of a surrogate endpoint used to select a drug for phase 3 study. The problem is that the correlation between response rate and the sort of endpoints regulators require, such as improved survival, is not sufficiently good to allow a high response rate in phase 2 to lead directly to a licence. The same will generally apply to comparisons of the response rates in randomized trials.

In order to get away from using survival-based comparisons, which clearly take a long time, investigators must show that some earlier measure reliably predicts the final outcome. An example of such a measure is the ‘time to progression’ mentioned above. This is the time taken for the tumour to grow or spread by pre-specified amounts and is commonly used as a registration trial endpoint in early breast cancer. In some disease settings, for example, PSA in prostate cancer, the candidate marker is unreliable, and drugs in prostate cancer are still currently stuck with needing to show improved survival to get a licence. In prostate cancer, studies are currently evaluating a novel method of response which is counting the number of circulating tumour cells. Typically, these are present in tiny numbers – around 5 per 7.5 milliliters of blood is the key cut-off level – a very tiny number of needles in a massive haystack of tens of millions of blood cells. If validated, such a test could greatly accelerate the pace of cancer drug development in diseases like prostate cancer currently stuck with overall survival endpoints. As shorter trials are cheaper, it could also reduce the price of the drug when licensed.

Comparisons of existing treatments

The phase 1-3 schema described above can broadly be fitted to any new technique or drug combination, however requirements differ in different countries. Trials comparing existing drugs in novel combinations are often undertaken by academic organizations such as Cancer Research UK or the US National Cancer Institute. Using the template above will give reliable results that can influence practice and are the gold standard for advancing medical practice in general. The system becomes much less clear with surgical techniques, radiotherapy equipment, other devices, and biomarkers, though. For example, new technologies such as robotic surgery are introduced as incremental improvements. These ‘improvements’ are treated as self-evident; when in fact they may be nothing of the sort. For example, comparing open with robot-assisted surgery: access routes to the body are different; the tactile connection between the surgeon’s hands and the tissues is lost in robot-assisted surgery; control of bleeding or complications such as bowel perforation may present different risks, possibly requiring conversion from robot-assisted to an open conventional operation; theatre times may be longer when surgeons are training, and so on. It is clearly entirely plausible that each of these factors may substantially affect outcomes. In addition, there is the massive issue of cost. A surgical robot costs over £1 million, with another £100,000–£150,000 annual running costs. Even if the outcomes are better, how much is it worth paying for, say, and earlier discharge from hospital?

One might expect that the introduction of such a technique, for example for prostate removal, would require the same sort of trials that new drugs for prostate cancer require, with equivalence or better in outcomes. No such trial has ever been carried out, yet surgical robots are working in major surgical centers across the world, particularly in the USA. Why the massive discrepancy? In essence, new devices simply have to demonstrate safety and fitness for the purpose for which they are designed. Where changes are genuinely small and incremental clearly a massive trial to show that a new scalpel is slightly better would be impractical and probably meaningless. At some point, the change ceases to be incremental and surgical robots seem to me to be a good example of this, yet are still treated as if they were simply a slightly better scalpel. In the USA in particular, buying a surgical robot has become an essential part of the marketing of a hospital – it is an iconic piece of kit – what go-ahead institution would want to be without one? Grappling with this issue is likely to become more and more important as healthcare systems struggle with rising costs. Conceivably, of course, new technologies may actually save costs. Sticking with the robots, it is not implausible that the claimed shorter learning curve, shorter hospital stay, and reduced complication rates could pay back the capital and running costs. At present, however, we simply don’t know.

Similar arguments apply to tests such as imaging and other diagnostic tests. Again, there is an element of not needing to do research to validate the obvious – a scan with a sharper image is likely to be better than a fuzzy one! However, when we look more closely, things get more tricky. For example, one of the key drivers to decision-making is whether the cancer has spread to a particular organ. In general, if a scan looks abnormal in a particular area known to be at risk, it is likely that this represents disease. The converse is not the case, however – a negative scan could be negative or could mean the disease is below the threshold for detection. This is illustrated by the hypothetical liver scan. A good example of this sort of problem is the detection of cancer in lymph nodes. As lymph nodes are normal structures and cancer in lymph nodes is of similar density (and therefore imaging appearance) to the normal tissue, imaging can only tell us if nodes are of normal or abnormal dimensions – typically, the cut-off size is around 5 millimeters. Clearly, if we have a 4-millimetre cancer deposit replacing the bulk of a node, it will therefore look ‘normal’. Suppose a potentially better imaging test for node disease is developed, how should it be evaluated?

Such a test would fall into the same sort of regulatory route as surgical devices – we need to show safety and fitness for purpose. Safety is straightforward – the phase 1/2 route clearly works fine, but how do we demonstrate ‘fitness for purpose’? The answer is some form of clinical trial, but the question of endpoints is very tricky – how many ‘normal’ lymph nodes harboring small cancers do we need to detect to be worthwhile? How many are we allowed to miss? How do we evaluate the ‘true’ positive and negative rates? Should we move to broader clinical outcomes rather than counting lymph nodes – for example, does the application of the test result in better clinical results, for example longer survival times, than the standard way of managing the patient?

These are all very difficult issues when applied to imaging technology when the acquisition costs of new scanners are very high. Even for technologies that augment existing scanners, for example new contrast medium drugs, the issues are substantial, and there is not a single consistent route across the globe.

Similar arguments apply to diagnostic tests. Again, at first sight, the problem would seem simple – if we have a blood test that correlates with the cancer, then we should use it as part of the basis for clinical decisions. However, if we examine the literature, we find many examples of tests that correlate with presence or absence of disease but very few are actually used clinically – why should this be? The principal answer to this is that the test has to give additional information over what we know already. For example, there are a whole range of urine tests that correlate with the presence of bladder cancer but none are used in the UK. Patients with suspected bladder cancer need a cystoscopy to confirm the diagnosis. The available urine tests are not reliable enough to exclude patients from a cystoscopy. Once the bladder has been examined, if a tumour is seen, a biopsy is needed. Again, the tests are not sufficiently reliable to obviate the need for biopsy. In addition, the excision biopsy is also part of the treatment, so however good the test, the patient still needs the operation. How about predicting prognosis? Again, the urine test is good but not as good as the pathological study of the removed tumour, so again it adds nothing. Given the above, the correct test for a diagnostic procedure is its effect on outcomes – can the test spare invasive procedures or predict which of a range of treatment options is best? This requires large-scale trials similar to those needed to license a drug and is the reason there are so few established tests or markers used in the clinic as decision aids.

There are examples of markers that correlate well with the disease and can be used to predict clinical events in advance of clinical symptoms or obvious scan changes. Examples of such markers include PSA in prostate cancer, CA125 in ovarian cancer, and AFP and HCG in testicular cancer. Even when good markers exist, they cannot necessarily replace other clinical methods of assessment. For example, while changes in PSA largely reflect changes in disease status, some treatments that affect the clinical outcomes (the bone-hardening drugs called bisphosphonates are a good example) have very little effect on PSA levels despite helping prevent bone damage by the cancer. Even more surprisingly, a recent huge study in ovarian cancer and the use of markers produced very counterintuitive results. A rising level of CA125 in the blood accurately predicts clinical relapse. One might expect that treating relapse early would be better than waiting until symptoms developed. The study compared a policy of marker-driven treatment (that is, treatment for relapse started with rising marker levels) with clinical symptom-driven treatment. A total of around 1,500 women took part in the study, and the earlier introduction of treatment in women with more intensive monitoring did not affect survival times. Even more surprisingly, quality of life and anxiety levels were better in the women with the clinically driven treatment – tighter monitoring and earlier treatment were actually therefore inferior overall.

A huge focus of current research is individualized therapy – identifying markers that allow treatment to be tailored to the individual. There are many ways tumours can be characterized – by their DNA mutations, by their patterns of protein expression, by looking at the activities of different enzymes.

However, while it is relatively easy to identify patterns that correlate with different outcomes, it will be obvious from the above discussion that this will not be sufficient to allow treatment to be altered. To demonstrate clinical value will require clinical trials comparing the candidate marker-driven policy with standard care. As the ovarian cancer example above shows, even having a good marker does not guarantee the expected result. A further problem that may emerge is that the numbers of candidate markers being developed may exceed the capacity of research teams to carry out trials, possibly many times over. Furthermore, markers effectively change a disease from being a homogenous entity to a number of distinct sub entities. As good trials need large numbers, this makes doing trials more difficult – the disease effectively becomes rarer. This is illustrated by recent changes in renal cancer. A number of pathological variants had been described some time ago but until the advent of targeted small molecules, this made no difference to treatment options. As already discussed, the abnormalities in clear cell renal cancer (around 70% of the total) lead to new treatments. What then for the remaining 30%? Several different further subtypes make up this 30%, hence trials now become difficult as each is really rather uncommon. As a result, we don’t really know how to manage these subgroups. These so-called ‘orphan’ diseases will become increasingly common and problematical as there will be little in the way of trial data to inform treatments, and trials will be difficult due to lack of numbers.

The coming years will see many exciting developments in new cancer drugs, new biomarkers, and exciting and futuristic technology such as surgical robots. How we incorporate these developments in practice will depend in large measure on clinical research to underpin their use. However, new technologies in particular will have a tendency to be introduced via the marketing rather than trials route. How we license, regulate, and fund these devices will become increasingly problematical as healthcare budgets come under pressure with an ageing population and the massive debt overhang of the credit crunch.