The prostate-specific antigen (PSA) test has been examined in several observational settings for initial diagnosis of disease, as a tool in monitoring for recurrence after initial therapy, and for prognosis of outcomes after therapy. Numerous studies have also assessed its value as a screening intervention for the early detection of prostate cancer. The potential value of the test appears to be its simplicity, objectivity, reproducibility, relative lack of invasiveness, and relatively low cost. PSA testing has increased the detection rate of early-stage cancers, some of which may be curable by local-modality therapies, and others that do not require treatment. The possibility of identifying an excessive number of false-positive results in the form of benign prostatic lesions requires that the test be evaluated carefully. Furthermore, there is a risk of overdiagnosis and overtreatment (i.e., the detection of a histological malignancy that, if left untreated, would have had a benign or indolent natural history and would have been of no clinical significance). Randomized trials have therefore been conducted.
Randomized Trials of PSA Screening
The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial
The PLCO Cancer Screening Trial is a multicenter, randomized, two-armed trial designed to evaluate the effect of screening for prostate, lung, colorectal, and ovarian cancers on disease-specific mortality. From 1993 through 2001, 76,693 men at ten U.S. study centers were randomly assigned to receive annual screening (38,343 subjects) or usual care (38,350 control subjects). Men in the screening group were offered annual PSA testing for 6 years and digital rectal exam (DRE) for 4 years. The subjects and health care providers received the results and decided on the type of follow-up evaluation. Usual care sometimes included screening, as some organizations have recommended.
In the screening group, rates of compliance were 85% for PSA testing and 86% for DRE. Self-reported rates of screening in the control group increased from 40% in the first year to 52% in the sixth year for PSA testing and ranged from 41% to 46% for DRE.
After 7 years of follow-up, with vital status known for 98% of men, the incidence of prostate cancer per 10,000 person-years was 116 (2,820 cancers) in the screening group and 95 (2,322 cancers) in the control group (rate ratio, 1.22; 95% confidence interval , 1.16–1.29). The incidence of death per 10,000 person-years was 2.0 (50 deaths) in the screening group and 1.7 (44 deaths) in the control group (ratio rate, 1.13; 95% CI, 0.75–1.70). The data at 10 years were 67% complete and consistent with these overall findings (incidence ratio rate, 1.17; 95% CI, 1.11–1.22 and mortality ratio rate, 1.11; 95% CI, 0.83–1.50). Thus, after 7 to 10 years of follow-up, the rate of death from prostate cancer was very low and did not differ significantly between the two study groups.
Prostate cancer mortality data after 13 years of follow-up continued to show no reduction in mortality resulting from prostate cancer screening with PSA and DRE. Organized screening in the intervention group of the trial did not produce a mortality reduction compared with opportunistic screening in the usual care group. There were 4,250 men diagnosed with prostate cancer in the intervention group and 3,815 men in the usual care group. Cumulative incidence rates were 108.4 per 10,000 person-years in the intervention group and 97.1 per 10,000 person-years in the usual care group (relative risk , 1.12; 95% CI, 1.07–1.17). The cumulative prostate cancer mortality rates were 3.7 (158 deaths) per 10,000 person-years in the intervention group and 3.4 (145 deaths) per 10,000 person-years in the usual care group (RR, 1.09; 95% CI, 0.87–1.36).
There were no apparent associations with age, baseline comorbidity, or PSA testing before the trial, as hypothesized in an intervening analysis by a subgroup analysis. These results are consistent with the previous report at 7 to 10 years of follow-up described above. All prostate cancer incidents and deaths through 13 years of follow-up or through December 31, 2009, were ascertained.
The 13-year follow-up analysis reported 45% of men in the PLCO trial had at least one PSA test in the 3 years before randomization. Annual PSA screening in the usual care arm was estimated to be as high as 52% by the end of the screening period. The intensity of PSA screening in the usual care group was estimated to be one-half of that in the intervention group. Stage-specific treatment between the two arms was similar.
An extended follow-up analysis for mortality, with median follow-up of almost 17 years (intervention group, 16.9 years; usual-care group, 16.7 years), showed prostate cancer mortality rates of 5.5 (333 deaths) per 10,000 person-years in the intervention group and 5.9 (352 deaths) per 10,000 person-years in the usual-care group, producing a rate ratio of 0.93 (95% CI, 0.81–1.08). An analysis of nonprotocol screening during the postscreening phase of the trial showed that 78.7% of men in the usual-care group and 80.3% of men in the intervention group had received a PSA test within the past 3 years, and that 85.9% of men in the usual-care group and 98.9% of men in the intervention group had ever had a PSA test.
Possible explanations for the lack of a significant reduction in mortality in this trial include the following:
- Annual screening with the PSA test using the standard U.S. threshold of 4 ng/L and DRE to trigger diagnostic evaluation may not be effective.
- The substantial level of screening in the control group could have diluted any modest effect of annual screening in the intervention group.
- Approximately 44% of the men in each study group had undergone one or more PSA tests at baseline, which would have eliminated some cancers detectable on screening from the randomly assigned population. Thus, the cumulative death rate from prostate cancer at 10 years in the two groups combined was 25% lower in those who had undergone two or more PSA tests at baseline than in those who had not been tested.
- Improvement in therapy for prostate cancer during the trial may have resulted in fewer prostate-cancer deaths in the two study groups, which blunted any potential benefits of screening.
- After a PSA finding greater than 4 ng/mL, within 1 year only 41% of men underwent prostate biopsy; within 3 years of this finding, only 64% of men underwent prostate biopsy. Such lower biopsy rates, associated with lower prostate cancer detection rates, may have blunted the impact of screening on mortality.
The European Randomized Study of Screening for Prostate Cancer (ERSPC)
The ERSPC was initiated in the early 1990s to evaluate the effect of screening with PSA testing on death rates from prostate cancer. Through registries in seven European countries, investigators identified 182,000 men between the ages of 50 and 74 years for inclusion in the study. Although the protocols differed considerably among countries, generally the men were randomly assigned to either a group that offered PSA screening at an average of once every 4 years or to a control group that did not receive screening. The predefined core age group for this study included 162,243 men between the ages of 55 years and 69 years. The primary outcome was the rate of death from prostate cancer. Mortality follow-up was identical for the two study groups and has been reported through 2010.
The protocol, including recruitment, randomization procedures, and treatment definition and schedule, differed among countries and was developed in accordance with national regulations and standards. In Finland, Sweden, and Italy, the men in the trial were identified from population registries and were randomly assigned to the centers before written informed consent was provided. In the Netherlands, Belgium, Switzerland, and Spain, the target population was also identified from population lists, but when the men were invited to participate in the trial, only those who provided consent were randomly assigned. Randomization was 1:1 in all countries except Finland, in which it was 1:1.5. The definition of a positive test and the testing schedule also varied by country.
In the screening group, 82% of men accepted at least one offer of screening. At a median follow-up of 9 years, there were 5,990 prostate cancers diagnosed in the screening group (a cumulative incidence of 8.2%) and 4,307 prostate cancers in the control group (a cumulative incidence of 4.8%). There were 214 prostate-cancer deaths in the screening group and 326 prostate-cancer deaths in the control group in the core age group (RR, 0.80; 95% CI, 0.67–0.95). The rates of death in the two study groups began to diverge after 7 to 8 years and continued to diverge further over time. With follow-up through 13 years, there were 7,408 prostate cancers in the intervention group during 775,527 person-years of follow-up and 6,107 cancers in the control group with 980,474 person-years of follow-up (RR, 1.57; 95% CI, 1.51–1.62). There were also 355 prostate cancer deaths over 825,018 person-years of follow-up in the intervention group and 545 deaths over 1,011,192 person-years of follow-up in the control group (RR, 0.79; 95% CI, 0.69–0.91). Consequently, 781 men needed to be invited for screening to avert one prostate cancer death, and 48 men needed to be biopsied. At 16 years of follow-up, the prostate cancer mortality rate ratio was 0.80 (95% CI, 0.72–0.89), and the prostate cancer incidence rate ratio was 1.41 (95% CI, 1.36–1.45). Therefore, 570 men needed to be invited to prevent one prostate cancer death, and 18 men needed to be diagnosed to prevent one prostate cancer death.
Overall, PSA-based screening was reported to reduce the rate of death from prostate cancer by about 20% but was associated with a high risk of overdiagnosis.
Of the seven centers included in the study, two individually reported a significant mortality benefit associated with prostate cancer screening (the Netherlands and Sweden). It is not readily apparent which factors at these two centers (PSA thresholds or intervals between testing used, mean age of patients, sample size) might explain the observed difference. It is important to note that the trial was not designed for individual countries to have adequate statistical power to find a significant mortality reduction.
Important information that was not reported included the contamination rate in the entire control group. Further, there was some evidence that the treatment administered to the prostate cancer patients differed by stage and by randomly assigned group, with the screening group receiving radical prostatectomy (40.3%) more often than the control group (30.3%). Such a difference in treatment could have contributed to any mortality difference between the trial arms. To address this issue, an analysis was conducted for each treatment, separately in each trial arm, in which logistic regression models were fitted for treatment allocation and risk of prostate cancer death, then combined to estimate prostate cancer deaths. The differences in prostate cancer deaths when the screened arm model was applied to the control arm, and vice versa, were very small, leading the authors to conclude that differential treatment explains only a trivial proportion of the main trial findings.
However, concerns with this analysis include the following:
Most of these cases were early stage, including overdiagnosed cases, for which treatment differences would likely make little difference, and from which only a limited fraction of the prostate cancer deaths arise. Thus, any treatment difference effect on the advanced cases, and deaths, would likely be diluted by using this approach.
Possible harms included overdiagnosis, which was estimated at 30% in the Finnish center on the basis of excess cases in the screening arm if the cumulative risk of prostate cancer had been the same as the control arm. The Spanish center also reported an excess of prostate cancers in the intervention arm (7.8%) versus the control arm (5.2%) after a median 21 years of follow-up.
The Goteborg (Sweden) trial
In December 1994, 20,000 men born between 1930 and 1944 (aged 50–64 years) and living in Goteborg, Sweden, were randomly assigned in a 1:1 allocation to either a control group or a screened group and offered PSA testing every 2 years. The PSA threshold for biopsy was 2.5 ng/mL. Seventy-seven percent of men in the screened group attended at least one screen. At 18 years of follow-up, 1,396 men in the screened group and 962 in the control group had been diagnosed with prostate cancer (hazard ratio, 1.51; 95% CI, 1.39–1.64). There was an absolute reduction in prostate cancer mortality of 0.52% (95% CI, 0.17%–0.87%), with an RR of 0.65 (95% CI, 0.49–0.87).
A concern with this trial is double reporting of information, because most participants were included in the ERSPC trial, but results have been reported separately for each trial. An initial publication indicated that in 1996 this study became associated with the ERSPC trial, and results from men born between 1930 and 1939 were published in a previous ERSPC report. A later publication states that since 1996 the Goteborg trial has constituted the Swedish arm of ERSPC; however, an ERSPC publication included about 12,000 participants from Sweden, or about 60% of the Goteborg trial population.
Unlike the other ERSPC centers, not all the participants from the Goteborg center were included in the ERSPC study. Some have argued that the ERSPC trial should be treated as a meta-analysis.
The Cluster Randomized Trial of PSA Testing for Prostate Cancer (CAP)
The CAP trial of PSA screening was conducted in the United Kingdom. This was a primary care-based cluster randomized trial of an invitation to a single PSA test, followed by standardized prostate biopsy in men with PSA levels of 3 ng/mL or higher. The trial was designed to determine the effect of the intervention on prostate cancer mortality. The primary end point was definite, probable, or intervention-related prostate cancer mortality at a median follow-up of 10 years. Participants were aged 50 to 69 years at entry and were enrolled between 2001 and 2009, with passive follow-up through national database linkage completed on March 31, 2016. Randomization was stratified within geographical groups and block sizes of 10 to 12 neighboring practices using a computerized random number generator. Men with a positive PSA test diagnosed with clinically localized prostate cancer were recruited to the Prostate Testing for Cancer and Treatment (ProtecT) study for treatment. All other cancers received standard National Health Service management. The design called for 209,000 men in each group to provide sufficient events to allow a prostate cancer mortality RR of 0.87 to be detected with 80% power at a significance level of 0.05, assuming an uptake of PSA testing between 35% and 50%.
Nine hundred-eleven primary care practices were randomly assigned within 99 geographical areas in the United Kingdom; 466 were assigned to the intervention group, and 445 were assigned to the control group. After various exclusions among both practices and potential participants, the analyses were conducted using data from 189,386 men in 271 practices in the intervention group and 219,439 men in 302 practices in the control group. In the intervention group, 75,707 (40%) men attended a PSA testing clinic, and 67,313 (36%) men had a PSA blood sample taken. Among these men, 11% of men had a PSA level between 3 ng/mL and 19.9 ng/mL (eligible for the ProtecT trial); of whom, 85% of men had a prostate biopsy. Cumulative contamination in the control group was estimated to be 10% to 15% over 10 years.
After a median 10-year follow-up, there was no significant difference between the two groups in prostate cancer mortality. The prostate cancer death rates were 0.30 per 1,000 person-years (549 deaths) in the intervention group and 0.31 per 1,000 person-years (647 deaths) in the control group (rate difference, -0.013 per 1,000 person years ; RR, 0.96 ). Secondary analyses indicated no effect on all-cause mortality (RR, 0.99; 95% CI, 0.94–1.03), but there was a higher prostate cancer incidence rate in the intervention group (4.45 per 1,000 person-years) compared with the control group (3.80 per 1,000 person-years). There was no reduction in advanced prostate cancers (Gleason 8–10 or T4, N1, or M1). The increased detection was confined to lower Gleason grade or lower-stage cancers, emerged at the beginning of screening, and persisted throughout the duration of follow-up, suggesting overdiagnosis.
Limitations of the CAP trial include the following:
The Norrkoping (Sweden) study
The Norrkoping study is a population-based nonrandomized trial of prostate cancer screening. All men aged 50 to 69 years living in Norrkoping, Sweden, in 1987 were allocated to either an invited group (every sixth man allocated to invited group) or a not-invited group. The 1,494 men in the invited group were offered screening every 3 years from 1987 to 1996. The first two rounds were by DRE; the last two rounds were by both DRE and PSA. About 85% of men in the invited group attended at least one screening; contamination by screening in the not-invited group (n = 7,532) was thought to be low. After 20 years of follow-up, the invited group had a 46% relative increase in prostate cancer diagnosis. Over the period of the study, 30 men (2%) in the invited group died of prostate cancer, compared with 130 (1.7%) men in the not-invited group. The RR of prostate cancer mortality was 1.16 (95% CI, 0.78–1.73).
The Quebec (Canada) trial
In the randomized prospective Quebec study, 46,486 men identified from the electoral rolls of Quebec City, Canada, and its metropolitan area were randomly assigned to be either approached or not approached for PSA and DRE screening. A total of 31,133 men were randomly assigned to screening, while a total of 15,353 were randomly assigned to observation. Using an intention-to-treat analysis based on the study arm to which an individual was originally assigned, no difference in mortality was seen; there were 75 (0.49%) deaths among the 15,353 men who were randomly assigned to observation group compared with 153 (0.49%) deaths among the 31,133 men randomly assigned to screening group (RR, 1.085).
The Stockholm (Sweden) trial
In 1988, from a population of 27,464 men in the southern part of Stockholm, 2,400 men aged 55 to 70 years were randomly selected to undergo screening with DRE, transrectal ultrasound, and PSA (cutoff >10 ng/mL). Seventy-four percent of the men accepted the screening invitation. After 20 years of follow-up, there was no indication of a reduction in prostate cancer mortality (RR,1.05; 95% CI, 0.83–1.27) or in overall mortality (RR, 1.01; 95% CI, 0.95–1.06), but screening was limited to a single episode. There was an indication of excess prostate cancer incidence in the invited population (RR, 1.12; 95% CI, 0.99–1.25), suggesting overdiagnosis.
The authors of a large, randomized, Swedish-based noninferiority trial that was designed to study the performance of magnetic resonance imaging (MRI) in prostate cancer screenings of general populations reported that MRI-targeted biopsy was noninferior to standard biopsy in detecting clinically significant cancers in men with elevated PSA levels. The authors also reported that MRI-targeted biopsy decreased unnecessary biopsies and diagnosis of clinically insignificant cancers. In this prospective, population-based, noninferiority trial, 1,532 men with a PSA level more than 3 ng/mL were randomly assigned in a 2:3 ratio; 603 underwent standard biopsy, and 929 underwent targeted and standard biopsy if MRI findings were concerning for prostate cancer. The primary outcome was the probability of detecting clinically significant cancer (Gleason score of >3+4). The key secondary outcome was the detection of clinically insignificant cancers (Gleason score of 6) and the number of biopsies.
Key findings of the intention-to-treat analysis included the following:
- Clinically significant cancer was diagnosed in 192 (21%) of 929 men in the MRI-targeted biopsy group versus 106 (18%) of 603 men in the standard-biopsy group (difference, 3%; 95% CI, −1% to 7%; P .001 for noninferiority).
- Clinically insignificant prostate cancer was diagnosed in 41 men in the MRI-targeted group versus 73 (12%) men in the standard-biopsy group (difference, −8%; 95% CI, −11% to 5%).
- Biopsies were benign in 105 (11%) men in the MRI-targeted group versus 259 (43%) men in the standard-biopsy group (difference, −32%; 95% CI, −36% to −27%).
- Antibiotic-treated postbiopsy infections occurred in 2% of the MRI-targeted group versus 4% of the standard-biopsy group (difference, −2%, 95% CI, −4% to 0.1%).
- When normalized to 10,000 men, MRI-targeted biopsies resulted in 409 fewer men undergoing biopsy (48% lower incidence), 366 fewer men with benign biopsies (78% lower incidence), and 88 fewer men with clinically insignificant cancers (62% lower incidence).
- The authors calculated that a detection of 1.7 clinically significant cancers would be delayed for each clinically insignificant cancer avoided and recommended use of standard biopsy, in addition to targeted biopsy, for men with positive MRI results.
In summary, initial results of this large randomized trial suggest that men older than 50 years with elevated PSA levels and negative MRI-targeted biopsy may be able to reduce overdiagnosis and overtreatment of low-risk cancer while maintaining the ability to detect clinically significant cancer. Study limitations included low uptake (26% of invited men participated in the trial). Additionally, some participants did not undergo the assigned intervention, and the true disease status of participants was unknown. Another challenge was implementing high-quality MRI screening because of variability of skill and experience among participating radiologists.
Post hoc analysis of randomized screening trials
The problems associated with drawing valid inferences from observational studies also apply to post hoc analyses of randomized trials. For example, analyzing randomized trial results in various ways is subject to the problem association caused by multiplicities. Statistical conclusions maintain their standard interpretations only when analyzing the trial’s primary end point according to the trial’s protocol or statistical analysis plan. In some settings, statistical adjustments are possible to account for multiplicities. But quite beyond problems of multiplicities, some analyses are so prone to bias that they are of limited value.
Randomization eliminates or at least minimizes many systematic biases. However, randomization shields an analysis from bias only if it considers a group randomized to one intervention compared with a second group randomized to another intervention. If an analysis mixes the two groups, then the virtue of randomization is lost.
Patients can deviate from the intervention to which they were assigned. This is sometimes called contamination. But to preserve the protection of randomization, they are counted within the group to which they were assigned: termed an intention-to-treat or intention-to-screen analysis. An alternative that is sometimes used is an as-treated or as-screened analysis, which is prone to important biases. In such analyses, participants who are screened are compared with those who were not screened, regardless of their assigned group. This is attractive to some investigators because it seems to address the right question. In addition, it seems to correct for contamination in both directions, and thereby, increases statistical power; but such an approach is flawed.
There are powerful biases associated with as-screened analyses; some are easily recognized, and some are not. A participant who chooses to be screened despite randomization to the control group differs from one who accepts an assignment to be screened. For example, such a person may be generally in better health or may have been screened previously, and so, is less likely to be diagnosed with cancer. There are similar differences for participants who eschew invitations to be screened versus those who accept assignment to the control group.
In addition to preserving randomization, an intention-to-screen analysis is most relevant for informing a decision about instituting a screening program or recommendation in some populations. The following section considers two analyses that are subject to the as-screened flaw.
The Quebec study
As indicated above, the intention-to-screen analysis of this trial showed no detectable difference in prostate cancer mortality between the two groups. However, the investigators focused on as-screened analyses. They observed that there were 4 prostate cancer deaths (0.056%) among the 7,155 men who were screened and 44 prostate cancer deaths (0.31%) among the 14,255 men who were not screened, an RR of 5.5. Based on exposure times, the investigators attributed the 67.1% reduction in prostate cancer death rate to screening. This conclusion is flawed, as pointed out by other investigators. (see above)
Modeling the ERSPC combined with the PLCO Cancer Screening Trial
The PLCO cancer screening trial evinced greater contamination than did the ERSPC trials, especially in the control group. Three modeling groups attempted to account for the effect of differential contamination using a novel derived measure called mean lead time (MLT), which reflected the average intensity of screening in each arm in the two trials. The investigators found substantial reductions in prostate cancer mortality caused by screening. Moreover, they found very similar reductions per MLT in PLCO and ERSPC. Both methods and conclusions are prone to biased conclusions and have been criticized by several groups of scientists. This analysis also ignored the other potential shortcomings identified above (see above).