In the mid-2000s, the National Cancer Institute (NCI) in the US brought together a group of experts to try to overcome an issue that had arisen in the reporting of results from breast cancer clinical trials.
Across the world, researchers were using slightly different endpoints for their trials. Endpoints are outcome measures that are used to compare treatments in a trial, or to determine the level of treatment activity.
“Trials may seem to be straightforward when you read about them in the news,” Professor Judith Bliss, Director of the ICR’s Clinical Trials and Statistics Unit (ICR-CTSU), told me. “But there are nuances in how they are reported.”
The endpoints being used in trials of early breast cancer treatments are what’s known as composite endpoints, for example relapse-free-survival (RFS) which gives an indication as to whether the patient is alive and free from cancer recurrence or not. RFS is measured over time and includes as “events” both cancer recurrence and deaths from any cause.
There are other similar endpoints (for example, disease-free-survival, breast-cancer-free-survival) and they each include some or all of recurrence (cancer coming back), a second primary cancer (a new tumor unrelated to the original tumor either in the the contralateral breast or elsewhere in the body), whether a patient dies before there is a recurrence, and whether a recurrence occurs locally or distant to the original cancer (in the breast, versus elsewhere in the body such as the brain).
“These “events” can be combined into what we call a ‘composite endpoint,” where multiple events are grouped together and recorded in a single endpoint,” Judith continued. “Historically, problems arose because people were choosing different sets of outcomes to put into their composite endpoints and giving the endpoints different names. For example, if a trial reported disease-free-survival as its primary endpoint, we did not have consistency in what components had been included as events in that endpoint.”
A comparability problem
The issue with this approach is that it makes it tricky to make comparisons between trials if they have different composite endpoints.
In order to overcome this issue, the NCI’s panel of experts drew up a set of guidelines called STEEP—Standardized Definitions for Efficacy End Points—which did exactly what it says on the tin, providing standard definitions for endpoints for breast cancer clinical trials.
However, over time some limitations of the guidelines were recognized.
“The group devised an endpoint called “Invasive Disease-Free-Survival,” Judith explained.
This included all types of events, including all recurrences, all second primary cancers and all deaths. However, this means that events which can be directly affected by the treatment intervention being studied, ie disease recurrence, can get overshadowed by events unrelated to the treatment intervention, for example older patients dying of natural causes.
“If you’re trying to look at differences between two groups, and you have an event like a death from another cause happening at the same rate in both groups, it’s going to be harder to see a difference between the two groups,” Judith said. “You might falsely conclude that there is no difference.”
Oranges, apples and pears
“Imagine that your composite endpoint is a fruit bowl and you have a separate bowl for each treatment being studied,” Judith continued. “Recurrences are apples. Red apples represent local recurrences, and green apples are distant recurrences. New primary cancers are pears. Now let’s imagine a patient dies during treatment, from an unrelated cause—this would still count as an endpoint in many cases. Let’s call those oranges.
“Where it gets tricky is when you get a treatment related death—this is still an orange but it is a cause for concern since it has been caused by the treatment you gave in the first place but you can’t distinguish it from deaths unrelated to treatment if they are relatively common. Recurrences matter and treatment related deaths matter, but by the time the fruit bowl has been filled up with fruit, some of the important endpoints are masked, or hidden, by others: you can’t see whether the number of the apples in the two bowls is different.”
STEEP was adopted by regulatory authorities across the global scientific community, but some researchers never felt quite comfortable using the guidelines, “because we didn’t want to lose sight of the apples.”
Changing the balance
One reason that this issue came to light when it did, was because historically, trials didn’t tend to include older people, and recurrence rates were much higher. Over time, that balance has changed: treatments improved and trials became more inclusive, allowing older patients to take part—so that there are now more deaths from causes unrelated to the treatment and fewer recurrences being reported.
“A good example of this is the POETIC trial,” Judith said. “The average age of patients in the trial is around 60, and we have had plenty of patients aged over 80. We followed them for six or seven years, so it’s not a surprise that there were some deaths without a breast cancer recurrence. Whereas in the past, very few patients aged over 65 went into trials.”
Because of these changes, using the endpoint “Invasive Disease-Free-Survival’ as set out in the original STEEP guidelines became an increasing problem, and some were reluctant to follow the guidelines at all.
“Working in academia allows us to be more pragmatic,” Judith told me. “Some of the trials we’ve published over the years have had more sensitive endpoints.”
False negatives and false positives
Over time, people began to realize that they might be getting wrong answers from trials, either false negatives or false positives.
A well known trial called TAILORx was reported a few years ago. It concluded that a certain patient group could avoid chemotherapy.
“This generated concern amongst statisticians, because if you looked at the different events that led to the primary endpoint, only 30 percent were recurrences affected by the treatment being looked at—lots were deaths from unrelated causes,” Judith recounted.
“Saying that there were no differences between the groups could well have been a false negative. That’s when people really started to listen.”
An updated set of guidelines
The NCI set up a second project group, mainly led by researchers in the US, with Judith as the only non-US statistician to be invited—testament to her commitment to making her voice heard about this issue.
The new group assessed trials that had used the original STEEP guidelines, and found that the most common deviation from the guidelines was the exclusion of second primary cancers.
They also ran a set of simulations to model how results can be obscured if the wrong type of event is included in an endpoint.
The group produced a paper, now published in the Journal of Clinical Oncology, introducing an updated set of guidelines: STEEP 2.0.
The paper introduced some additional new endpoints, including one called ‘invasive breast-cancer-free survival’; the crucial difference between this and the original STEEP endpoint ‘invasive disease-free-survival’ being that invasive breast cancer free survival included all events except non-breast second primary cancers—to prevent these from masking important treatment related deaths and events relevant to the trial.
Not a ‘one size fits all’ approach
“One thing we’ve done in the paper is to talk about how different types of trials might have different considerations—there shouldn’t be a one size fits all approach to choosing composite endpoints,” Judith said. “In some cases you’re going to get a much more sensitive endpoint if you don’t include non-breast second primary cancers and that’s where invasive breast-cancer-free survival is a good endpoint to use.”
“We hope that this set of endpoints will have a much greater reach, beyond regulatory trials to academic trials like the ones we run in the ICR-CTSU. I have the confidence to use the guidelines in my own trials for the first time,” she said.
“By following the new guidelines, we are making sure that when trials are published, they are the most informative for patients.”