Advertisement
Journal of Clinical Oncology  
Search for:
Limit by:
  Browse by Subject or Issue
Home Search or Browse JCO My JCO Subscriptions Customer Service Site Map

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shen, Y.
Right arrow Articles by Zelen, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shen, Y.
Right arrow Articles by Zelen, M.
Journal of Clinical Oncology, Vol 19, Issue 15 (August), 2001: 3490-3499
© 2001 American Society for Clinical Oncology

Screening Sensitivity and Sojourn Time From Breast Cancer Early Detection Clinical Trials: Mammograms and Physical Examinations

By Yu Shen, Marvin Zelen

From the Department of Biostatistics, M.D. Anderson Cancer Center, University of Texas, Houston, TX; and Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, MA.

Address reprint requests to Yu Shen, MD, Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Blvd, Box 213, Houston, TX 77030; email: yushen{at}odin.mdacc.tmc.edu


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PURPOSE: To estimate sensitivities of breast cancer screening modalities and preclinical duration of the disease from eight breast cancer screening clinical trials.

PATIENTS AND METHODS: Screening programs invariably lead to diagnosis of disease before signs or symptoms are present. Two key quantities of screening programs are the sensitivity of the disease detection modality and the mean sojourn time (MST). The observed screening histories in a periodically screened cohort make it possible to estimate these quantities of interest. We applied recently developed statistical methods to data from eight randomized breast cancer screening trials to estimate the sensitivities of early detection modalities and MST. Moreover, when a screening trial involved two screening modalities, our methods enabled the estimation of the individual sensitivity of each screening modality.

RESULTS: We analyzed breast cancer data from several screening trials and have relatively complete data from the Health Insurance Plan (HIP), Edinburgh, and two Canadian studies. The screening sensitivity for mammography, physical examination, and MST were, respectively, HIP: 0.39, 0.47, and 2.5 years; Edinburgh: 0.63, 0.40, and 4.3 years; Canadian (age 40 to 49 at entry): 0.61, 0.59, and 1.9 years; Canadian (age 50 to 59 at entry): 0.66, 0.39, and 3.1 years.

CONCLUSION: The public debate on early breast cancer detection is mainly centered on mammograms. However, the current study indicates that a physical examination is of comparable importance. Cautious interpretation of trial differences is required as a result of various experimental designs and the age dependency of screening sensitivity and MST.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
BREAST CANCER IS the most frequently diagnosed noncutaneous cancer among women and, in recent years, has accounted for 30% of new cancer cases in women in the United States. It is the second leading cause of cancer deaths among women. Currently, a realistic strategy for the reduction of breast cancer mortality rates is to diagnose the disease while it is in an early stage. Eight randomized breast cancer screening trials have been conducted in the past three decades to assess the efficacy of mammographic examinations possibly combined with physical examinations. The goal of these screening programs is to detect breast cancer in the preclinical state, ie, when a tumor is present but there are no signs or symptoms. Two key parameters of the screening programs are the sensitivity of the disease detection modality and the time interval between the onset of the detectable preclinical state and the point of progress to the clinical state, causing signs and/or symptoms that are detectable by routine methods of diagnosis. This time interval is called the sojourn time in the preclinical state or simply the sojourn time. The sojourn time is a measure of how much earlier the disease may be detected by the screening procedure. The lead time is defined as the time gained by diagnosing the disease using special detection modalities before the patient experiences symptoms. It may vary from half of the sojourn time in the preclinical state to the full sojourn time. The longer the lead time is, the greater the possibility of detecting disease in an early stage.

The nature of the disease process precludes exact observation of the onset of the preclinical disease state. In addition, a screened-negative individual who has been followed up and later found to be positive may represent either a false-negative on previous screening examinations or a case newly developed since the last examination. Neither event is observable. Moreover, the sensitivity is not directly estimable unless all screening participants are subjected to a definitive diagnostic test, which is not feasible. Because of the aforementioned characteristics associated with earlier detection of disease, many existing statistical methods are not directly applicable to estimating the screening sensitivity and mean sojourn time. Nevertheless, the observed screening histories in a periodically screened cohort may make it possible to estimate these quantities.1-7 Despite these efforts, statistical methods for analyzing cancer screening data in a more general model structure are still limited.

Recently we8 developed new statistical methods to estimate both the sensitivity of individual detection modalities and the mean sojourn time. These methods can be applied to studies with more than one screening modality. These estimation methods result in maximum likelihood estimates. In this article, we apply the methods to the published data on breast cancer screening.


    PATIENTS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
There have been eight published randomized controlled clinical trials aimed at evaluating the early detection of breast cancer using mammography with or without an independent physical examination. These studies have been carried out in the United States, Canada, the United Kingdom, and Sweden. The trials are often referred to as the Health Insurance Plan (HIP) of New York study, Edinburgh study (United Kingdom), the Swedish Two-County (Kopparberg, Östergötland) studies, Malmö (Sweden), Stockholm (Sweden), Gothenburg (Sweden), and two Canadian studies. Table 1 lists some of the characteristics of these screening trials. It is clear that there are considerable variations in the screening modalities, eligible ages of participants, and designs of these studies.


View this table:
[in this window]
[in a new window]
 
Table 1.  Characteristics of Eight Randomized, Controlled Breast Cancer Screening Trials
 
The HIP study was carried out in the 1960s and was the first randomized breast cancer screening trial. Approximately 62,000 women, 40 to 64 years of age at entry, were randomly allocated to a control or screening group.9,10 Women in the study group were invited for an initial screening. Participants in the first round were then offered three additional annual screening examinations. Two-view mammography and physical examinations were independently carried out on the screened group at each examination. Women in the control group followed their usual practices in obtaining medical care.

In the substantially larger Swedish two-county trials, allocation to each of the two groups was done in geographic clusters in Kopparberg and Ostergotland counties in 1977.11-13 Clusters were randomized within blocks designed to be approximately homogeneous in demographic terms. By this means, a total of 77,080 women were assigned to a screening group and 55,985 women to a control group. The examinations in the screened group were performed with single-view mammography at 24-month intervals for women aged 40 to 49 and at 33-month intervals for women aged 50 to 74. Subsequently, a breast screening project was carried out in the city of Malmö, Sweden.14,15 Women in the Malmö trial were assigned by birth cohort into a control and a screened group. Invitations to participate were sent by mail. Approximately 21,000 women aged 45 to 69 were assigned to the screened group for mammography, and a similar cohort constituted the control group. Three screening rounds were completed, with an average interval of 22 months. In the screening examination, two-views mammography (craniocaudal and oblique) was used. In 1981, the Stockholm trial, the third randomized controlled trial in Sweden, was initiated. This involved an approximate total of 60,000 women, aged 40 to 64 years. Among them, approximately 40,000 women were randomized to mammography screening, and 20,000 women were randomized to a control group in which the women followed their usual medical care.16-18 The randomization was carried out using geographical clusters. During the trial, a total of 32,533 women in the study group attended the first round of screening. All the participating women in the screened group were then offered two rounds of mammography screening. Women in the control group were offered a single mammogram examination that was coincident in time with the second examination in the other group. Between 1983 and 1984 at Gothenburg, 11,724 women aged 39 to 49 years were randomized to the study group, and 14,217 women in the same age range were randomized to a control group.19 Women in the study group were invited to receive a mammographic screen every 18 months for a total of five rounds. Two-view mammography was used at each screen unless the density of the breast at the previous screen indicated that single-view was adequate.

Edinburgh was one of the centers in which screening was offered in the United Kingdom Trial of the Early Detection of Breast Cancer.20 During 1979 to 1981, 45,130 women in Edinburgh aged 45 to 64 were entered onto a cluster-randomized trial of breast cancer screening using mammography and physical examination. In this study, women were randomized to a screening group or control group. All registered women were eligible for the study unless they had previously been diagnosed with breast cancer.21 The experimental plan for the Edinburgh trial was different from most of the other existing screening trials; ie, screening was offered annually, but mammography and physical examinations were offered in the first, third, and fifth years, and physical examination alone was offered in the second, fourth, and sixth years.

More recently, the Canadian National Breast Cancer Study Group (CNBS) conducted two randomized, controlled studies to evaluate the efficacy of a screening program that combined annual mammography with physical examination for breast cancer.22,23 One trial was for women aged 40 to 49; the other was targeted at women of ages 50 to 59. Women volunteers who had no history of breast cancer and no mammograms in the previous 12 months were individually randomized to either a screened or a control group. In the 40-to-49 age group, a total of 50,430 women were entered onto the study. Each received an initial physical examination before being randomly allocated to a control or screened group. Women in the screened group had four or five annual screenings with both physical examination and mammography. The randomization was carried out without knowledge of the outcomes of the physical examinations. In the trial targeted at women 50 to 59 years of age, the same cycle of screening examinations was given to the screened group as in the trial involving younger women. However, women in the control group received an annual physical examination. The numbers of women in the screened group and control group among those of ages 50 to 59 were 19,711 and 19,694, respectively.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The findings of the early detection trials for breast cancer are controversial. There is general agreement that mammography is beneficial for women aged 50 or older but less agreement on its benefit for younger women. Furthermore, the magnitude of benefit may be impossible to estimate because of the large degree of noncompliance. Some of the trials described in the previous section had been designed to evaluate screening benefit by comparing a screened group with a control group (patients who received their usual care). Others have been planned so that women in the control group have an opportunity to receive a screening examination. For example, the Canadian study for younger women answers the question of the benefit of mammogram examinations after an initial physical examination, whereas the trial for older women addresses the question of periodic mammogram plus physical examination versus periodic physical examinations only. Furthermore, although these are randomized trials, some were randomized by individuals and others by regions (cluster randomization). The properties of statistical procedures are different for these two mechanisms of randomization.

Compliance was a major issue, especially in the studies randomized by geographical region in which potential women were invited to participate. Ideally, one would hope that all eligible women in a region allocated to examination would take advantage of the opportunity. Not to do so results in noncompliance. An intent-to-treat comparison requires that all mortality comparisons must be made according to the initial randomization independent of examination compliance. Noncompliance results in relatively inefficient comparisons. In addition, it is important to obtain complete follow-up information.

These problems do not arise to the same degree in the estimation of the sensitivity and the mean sojourn time with disease in the preclinical state. The estimates of these parameters are based only on subjects having examinations. The numerical values do not depend on compliance and do not require follow-up for patient mortality. Knowledge of these parameters is related to potential benefit. High sensitivity and large mean sojourn time reflect positive benefit. For example, a lower boundary on the mean lead time is half of the mean sojourn time in the preclinical state. If these two parameters are combined with staging information at diagnosis, it may be possible to predict the eventual mortality using empirical survival distributions, which are conditional on disease stage. However, it is worth noting that the most reliable and valid outcome in the assessment of screening programs is the breast cancer mortality rate.

Our statistical methods for estimating sensitivity and mean sojourn time were applied to these published breast cancer-screening trials. The data that we used to estimate the aforementioned quantities included all observed periodical screening examination histories. In particular, data consisted of whether a subject was diagnosed in a scheduled screening examination, the number of prior negative screening examinations, the modality of diagnosis if there was more than one screening modality, and whether a subject was diagnosed between scheduled examinations. Assume there are a total of k screening times. Let t0< t1<. . . < tk-1< T represent k-ordered screening examination times, and let T denote the follow-up time past the time of the last examination. Define the ith screening interval (ti-1, ti) for i = 1,2,. . . , k, where tk= T. Adopt the following notation: ni is the total number of individuals examined at ti-1; si is the number of cases detected at the examination given at ti-1; and ri is the number of cases diagnosed within the interval (ti-1, ti). For the ith examination, define

equation


to be the number of cases detected by modality 1 only, modality 2 only, and by both modalities, respectively. The total number of cases detected at the ith examination is then

equation


Note that the subscripts (ni, si, ri) refer to the ith screening interval.

Our statistical methods do not require any assumption about the association between the physical and mammogram examinations. On the other hand, we assume screening sensitivity to be a constant over time (or age). Under this assumption, the estimated sensitivity is an average sensitivity over the age cohort and time horizon of the study. We assume that the sojourn times follow exponential distributions under the stable disease model. The stable disease model assumes that the proportion of preclinical and clinical cases in a given population remains essentially constant over chronologic time. The natural history of the disease is assumed to be progressive and considered to have the following three states: a disease-free state or state in which the disease cannot be detected, a preclinical state, and a clinical state. There are two ways of obtaining the maximum likelihood estimates of sensitivity and mean sojourn time. One method is the full likelihood, which requires an estimation of the incidence of the disease. We used all the observed periodic screening data,{(ni, si, ri), i = 1,. . . . . . , k} when using the full likelihood estimation procedure. The other method uses the conditional likelihood, which eliminates the dependence on the incidence. The conditioning is based on the total number of screening-detected cases and cases diagnosed between two screening examinations (interval cases),{(sb ri, i = 1,. . . , k}. The conditional likelihood method does not require the total number of screening participants at each screen examination. Details are given in Shen and Zelen.8 The bootstrap method was utilized for estimating the SDs of the estimated parameters. The Edinburgh trial differed from the rest of the randomized, controlled breast cancer screening trials. The trial had a specific experimental design that used a combination of physical examination and mammography and a physical examination alone alternatively in 2-year screening intervals. Therefore, we modified the estimation methods to incorporate this plan.

HIP Study
To apply the estimation methods to the HIP study, we used the observed data from the first 4 years of follow-up after the start of screening. The relevant data were published in Shapiro9 (Table 2). In addition, the total numbers of women with breast cancer detected by screening methods of mammography only, physical examination only, and by both mammography and physical examinations are 44, 59, and 29, respectively. We found that the estimated overall screening sensitivity and mean sojourn time were 0.70 (SD, 0.20) and 2.5 (SD, 1.2) years, compared with estimations made by Day and Walter6 of 0.82 for sensitivity and 1.7 years for the mean sojourn time. Furthermore, the individual sensitivities of mammography and physical examination are estimated to be 0.39 (SD, 0.11) and 0.47 (SD, 0.14), respectively. It is important to note that the sensitivity of the physical examination is comparable to that of mammography.


View this table:
[in this window]
[in a new window]
 
Table 2.  HIP Data: Prevalence and Incidence of Breast Cancer in the First Four Years of Follow-up
 

View this table:
[in this window]
[in a new window]
 
Table 4.  Malmö Data: Breast Cancer Detected at Screening Examination and in Intervals
 
Swedish Two-County Study
Women aged 40 to 49 years were invited for screening every 24 months on average, and women aged 50 to 74 were invited every 33 months. The published data were not given in enough detail for application of our methods, except for the first screening examination data. Table 3 summarizes the relevant age-specific data and the age-specific incidence estimators that were obtained from the control group. By studying the data in Table 3, it is apparent that the sensitivity of the single-view mammography is low for the 40-to-49 age group. Note that the number of cancers that were detected through screening is comparable to the number of interval cases in the subsequent years for this age group. Furthermore, there is a trend toward sensitivity increasing with patient age. The estimated screening sensitivity and mean sojourn time for women aged 70 to 74 are 0.92 years (SD, 0.09) and 4.4 years (SD, 0.76), respectively. However, we have been unable to obtain reliable estimates for the other age groups.


View this table:
[in this window]
[in a new window]
 
Table 3.  Swedish Two-County Data: Breast Cancer Cases in the First Screening Examination by Age Group
 

View this table:
[in this window]
[in a new window]
 
Table 5.  Stockholm Data: Breast Cancer Cases in the First Screening Examination by Age Group Within Two Years
 
In contrast, on five occasions there have been published reports by the investigators associated with these trials on the sensitivity and mean sojourn time for this trial. In 1992, Tabar et al24 reported age-specific estimates of the sensitivity of mammography of 0.60, 0.86, 0.86, and 0.95 and corresponding mean sojourn times of 1.25, 3.03, 3.89, and 3.41 years for the age groups of 40 to 49, 50 to 59, 60 to 69, and 70 to 74, respectively. The estimation method used was based on that of Paci and Duffy.25 Using a Markov chain in Duffy et al26 to model the disease progression, Tabar et al27 also estimated the age-specific screen sensitivity and mean sojourn time. Specifically, for the age groups 40 to 49 years, 50 to 59 years, 60 to 69 years, and 70 to 74 years, the estimated mean sojourn times were 1.7, 3.3, 3.8, and 2.6 years, and the corresponding screening sensitivities were 0.86, 0.92, 0.94, and 1, respectively. In 1996, Chen et al28 provided age-specific estimators for the age groups of 40 to 49, 50 to 59, and 60 to 69 of 2.5, 3.7, and 4.2 years for the mean sojourn times, and 0.83, 1, and 1, respectively, for screening sensitivity using a Markov process model with a quasi-likelihood method. The same results were also presented in Tabar et al.29 Later, Chen et al30 reported age-specific mean sojourn times for the same age groups, except 70+, to be 1.5, 2.8, and 3.3 years, respectively, using a mover-stayer mixture of a Markov chain model. It is worth mentioning that the complete data used for estimation were not provided in the relevant articles. It is unclear whether the difference among the estimates was caused by different statistical methods or by discrepancy in the data.

Malmö Study
Among those invited for screening examinations, 15,748 women participated in the first screening examination, and 14,300 participated in the second. The data for the first two rounds of examinations were obtained from Andersson14 (Table 4). For the Malmö trial, we estimated the sensitivity of mammography and the mean sojourn times to be 0.61 (SD, 0.15) and 5.53 (SD, 2.1) years. The annual incidence of breast cancer was estimated to be 2.2 per 1,000 women. We have been unable to find any publications reporting estimates for these quantities in the literature for the Malmö study.

Stockholm Trial
The data for the Stockholm study (Table 5) were obtained from Frisell et al.16,17 The screening interval between rounds was planned to be 2 years but was actually 2.3 years. In Frisell et al,16 they estimated the screening sensitivity to be 0.86 using only first-round data. It is unclear what method was used to obtain the estimate.

From the data published in Frisell et al,17 we obtained the number of screen-detected cases and interval cases in the study group by three age groups for the first round of the screening examinations. Making use of the additional age-specific breast cancer incidence from the corresponding control group, we estimated the screening sensitivity for age groups 40 to 49 and 50 to 59 to be 0.64 (SD, 0.26) and 0.89 (SD, 0.12), respectively, and the corresponding mean sojourn times were estimated to be 2.1 years (SD, 1.3) and 2.6 years (SD, 0.61), respectively. The age-specific incidence was estimated by assuming the average time of follow-up to be 1 year for interval cancers and 2 years for others in the control group. As a result of the substantially small sample size in the oldest age group, 60 to 64, the estimated parameters may not be reliable, and the results are not presented.

Edinburgh Study
Table 6 shows the number of breast cancers detected by screening examinations, the number of interval cancers that were observed after each visit and before the next visit, and the total number of women who attended each screening examination in the Edinburgh study. Based on the observed data from the first six screening intervals, we estimated the sensitivity for the combined physical and mammogram examinations and physical examination alone to be 0.78 (SD, 0.04) and 0.40 (SD, 0.08), respectively. Assuming independence of mammography and a physical examination, the sensitivity of mammography was estimated to be 0.63 (SD, 0.13). The corresponding mean sojourn time was estimated to be 4.3 years (SD, 0.37). The annual incidence for breast cancer was estimated to be two per 1,000 women. Here, we assume the sojourn time distributions are the same for the two screening modalities. Compared to our results, Roberts et al21 reported their estimates for sensitivity to be 90% for mammography combined with physical examination and 69% for the physical examination alone. It is unclear what method was used to obtain the estimates in their analyses.


View this table:
[in this window]
[in a new window]
 
Table 6.  Edinburgh Data: Breast Cancers Detected and Interval Cases
 
Canadian Studies
The CNBS study group conducted two randomized, controlled studies to evaluate the efficacy of the screening examination, combining annual mammography with physical examination for breast cancer (Table 7). The first study was restricted to women aged 40 to 49 at study entry, and the second study registered women aged 50 to 59. The trial for women aged 40 to 49 at entry addressed the problem of mortality reduction of a group receiving an initial physical examination compared with a group receiving an initial physical examination plus an annual program combining both mammography and physical examinations. On the other hand, in the study with women aged 50 to 59 years at entry, the goal was to evaluate the comparative benefit of an annual combined screening program with mammogram and physical examination compared with an annual physical examination. The cases detected by either or both mammography and physical examination for the four screening intervals were obtained from Miller et al.22,23


View this table:
[in this window]
[in a new window]
 
Table 7.  CNBS Studies: Breast Cancer Detected and Interval Cases
 
The estimators and their SDs were based on data from the study group and used the full likelihood method. The results are summarized in Table 8. In both age groups, the sensitivities of the mammogram examinations are comparable. However, the sensitivity of physical examinations is about 34% lower for the older group (0.39) relative to the younger group (0.59). Note that the SDs are larger for the younger group. The estimated mean sojourn time is substantially larger for the 50-to-59 age group relative to the younger group, ie, 3.09 years (SD, 0.94) versus 1.87 years (SD, 1.17). Note that we used data from the three screening intervals rather than from the four screening intervals for the 40-to-49 age group because the irregular trend at the fourth screening examination yielded unreliable estimates.


View this table:
[in this window]
[in a new window]
 
Table 8.  CNBS Studies: Estimated Parameters
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
It is important to evaluate early detection trials as soon as possible without waiting for long-term mortality results. For this purpose, screening sensitivity can be used as an early indicator to assess the screening efficacy. The mean sojourn time serves as an upper boundary for the lead time. The individual screening sensitivity and mean sojourn time estimators and their corresponding SDs are summarized in Table 9. We believe that the present work provides a reasonable summary and comparison for the screening sensitivity and mean preclinical duration among these breast cancer screening trials. All of these calculations used common methods for estimation. Although the relevant quantities were estimated in the literature on some of these trials, the estimation methods have differed considerably. One should be cautious in interpreting differences among the trials. The age distributions varied among these trials, and it is likely that the sensitivity and mean sojourn time are age-dependent. Thus, in the absence of age-specific calculations, the estimator of the mean sojourn time and sensitivity represents an average over the given age group for each trial. Note also the different experimental design plans of these trials. Moreover, screening modalities may generally have improved over the years. For the HIP study, the estimated mammography screening sensitivity is much lower (39%), compared with those of the later trials.


View this table:
[in this window]
[in a new window]
 
Table 9.  Parameter Estimates of Seven Randomized, Controlled Breast Cancer Screening Trials
 
The estimation procedures we developed have general model assumptions and can be used in both the clinical trial setting and population-based observational studies. The methods can accommodate a variety of different situations that may occur in observational study. For example, the likelihood functions can be modified to allow for different follow-up schedules as well as for various experimental designs. Essentially, each individual can be treated separately in the likelihood function rather than grouped into equal intervals. However, the more complicated likelihood function will result in more intensive calculations.

In general, the more information we observe and collect, the greater the reliability of the estimates. In particular, successful use of the methods requires relatively large sample sizes. This is based on simulations that are not presented in this article. Moreover, the more screening intervals observed (including both the number of screening detected cases and interval cases), the more precise the estimates. If the interval between examinations is too large (ie, 5 years for breast cancer screening), the number of interval cases will increase, which results in fewer cases found at a scheduled screening examination. A smaller number of cases detected at a scheduled examination will result in poor precision for the estimates of interest.

We have concerns about the quality of the data collection for some of the screening trials. If the data associated with periodic examinations are incomplete because of poor follow-up, the statistical methods will not yield reliable estimates. We have performed some simulation studies to assess the impact of missing interval cases on the estimated parameters (unpublished data). Our results show that the estimates can be seriously biased or may not converge with missing interval cases. Hence, it is absolutely necessary to accurately report the interval cases. Moreover, to obtain unbiased estimates of the screening sensitivity and mean sojourn time, it is also crucial to have complete follow-up data for interval cases from the last screening examinations up to the last follow-up time (T).

Another breast cancer screening trial carried out in Gothenburg, Sweden, was not reported in the Results because of concerns about the available data. This trial focused on younger women aged 39 to 49 years. The published data in Bjurstam et al19 showed that the number of interval cases by screening round during the 4 years of follow-up seemed to be small. Using data from all five screening rounds, we estimated the screening sensitivity and mean sojourn time to be 0.99 and 2.3 years, respectively. It should be noted that the initial participant sample size (N = 9,921) is small for this younger age group. As a result, the estimates may not be as reliable as the estimates from the other studies. We are inclined to be cautious in interpreting these results. Using the method of Paci and Duffy,25 Bjurstam et al19 reported the estimated mean sojourn time to be 2.2 years and the sensitivity to be 87%.

Under the stable disease model assumption, the corresponding transition probabilities are also constant with respect to chronologic time. The stable disease model may serve as a reasonable approximation to some early detection clinical trials, when changes in the incidence of disease are relatively small over the examination period. In this case, the prevalence and incidence represent the average prevalence and incidence over the entire age distribution. Under the more general nonstable disease model, the extra information about age at examination for the observed breast cancer cases is necessary to estimate the age-specific transition probability, such as in the HIP study.8 Complete information was not available for all trials, especially on the age-specific percentages of women in the study and/or control groups. Among these studies, we have relatively complete data from the HIP, CNBS, and Edinburgh studies, whereas we have limited access to the other study data from the published literature. If the corresponding information on age-specific interval cases, screening detection rate, and incidence had been available for these studies, it would have been possible to estimate the age-specific screening sensitivity and mean sojourn time.

We note that the study groups in all the Swedish studies did not include a physical examination. In fact, the data from the HIP, Edinburgh, and CNBS studies suggest that the combination of mammography and physical examination has substantially greater sensitivity than mammography alone. Of course, this is to be expected. Moreover, a physical examination may especially benefit younger women because mammography is less sensitive for this group of women. On the other hand, our studies also suggest that tumor progression in the preclinical stage is more rapid in younger women than in older women.

In this investigation, we have applied the methods of maximum likelihood to estimate the sensitivity and mean sojourn time in the preclinical state for breast cancer early detection trials. It is not our intention to directly discuss the scientific evidence for the benefit of early detection of breast cancer. Recently, the debate has been renewed on the benefits of using mammography to detect breast cancer.31-39 However, our findings may supply additional evidence for discussion. It is worth noting that not all of the eight randomized trials were designed or carried out to evaluate the benefit of mammography for detecting breast cancer. The earliest trial (HIP) showed benefit. However, the screening group received both mammography and physical examinations. Our calculations show that the sensitivity was 0.39 (mammography alone) and 0.47 (physical examinations alone), to give an overall sensitivity of 0.70. The HIP mammography sensitivity might be judged too low by today’s standards, but when combined with a physical examination, it resulted in an acceptable level of sensitivity.

The Canadian trials have been severely criticized for having suboptimal mammography. Our calculation showed sensitivities (mammography only) of 0.61 and 0.66 for the trials involving women of ages 40 to 49 and 50 to 59, respectively. These two values are comparable when the uncertainty of the estimates are taken into account. However, when combined with a physical examination, the overall sensitivity is 0.91 and 0.82, respectively, for these two groups. These values are certainly comparable to or exceed the overall sensitivity of screening in the HIP trial. On the other hand, the Canadian studies were not designed to evaluate mammography compared with no mammography. The trial for younger women addressed the problem of comparative benefit of a group receiving an initial physical examination compared with a group receiving an initial physical examination plus an annual program combining both mammography and physical examinations. The incidence of breast cancer was so low for the younger group that, after identifying a proportion of the prevalent cases on the initial physical examination, the statistical power of the comparison of mammography versus no mammography is low. The power can be increased if the prevalent cases found on the initial examination are removed from the analysis. The scientific question is then changed, and it answers the question of comparing no examinations versus a combination of mammography and physical examinations after eliminating the prevalent cases. The trial on older women answers the question of the benefit of a periodic physical examination versus periodic combined examinations. This trial has relatively low power to detect meaningful differences in breast cancer mortality rates. One way of informally evaluating benefit is to estimate the proportion of node-negative cases diagnosed by the physical examination alone compared with a program of combined examinations. Reports in the literature indicate that approximately 80% of the cases of breast cancer found in an early detection program will be node-negative, compared with 50% in an unscreened group. Therefore, for the CNBS study on women aged 50 to 59, one would expect 75% (0.75 = [0.82][0.80] + [0.18][0.5]) of the women diagnosed in the screened group to be node-negative, compared with 62% (0.62 = [0.39][0.80] + [0.61][0.50]) in the control group. A trial to find such a difference in node-negative cases by diagnosis modality would require approximately 160 cases in each group for a one-sided test (significance level, 5%; power, 80%). If the rate of examination detection and interval cases average three per 1,000 (per year), then over a 3-year accrual such a trial would require approximately 18,000 participants in each group. If the end point was mortality, more subjects would be needed. The calculation of the exact power using mortality is a fairly complicated statistical problem. The basic equations for this purpose have been derived by Hu and Zelen.40

The likelihood of overdiagnosis in breast cancer screening has recently drawn some concerns.41-43 Overdiagnosis in mammographic screening is defined as a histologically established diagnosis of invasive or intraductal breast cancer that would never have developed into a clinically manifest tumor during the patient’s normal life expectancy if no screening examination had been carried out.42 This subset of cancer is nonprogressive or regressive in nature. The progressive disease model that we have considered in this article does not take overdiagnosis into account. It is unclear how to estimate the proportion of overdiagnosis cases in mammographic screening for breast cancer. Prorok, Kramer, and Gohagan43 pointed out that one possible consequence of overdiagnosis in a screening trial is that the screened arm contains a higher proportion of early stage cases even if there is no mortality effect from screening because such cases do not appear in the control group. If a substantial subset of breast cancer detected in screening examinations is nonprogressive, the estimates for screening sensitivity and mean sojourn time may be too optimistic (overestimated) as a result of overdiagnosis.

In summary, the early detection trials for breast cancer have been costly and have taken many years of effort. The key to better understanding the magnitude of potential benefit lies in making detailed clinical trial data publicly available. The analytic methods to analyze the data are not routine. Complete data from the HIP trial are publicly available. We strongly urge that others who possess trial data to do the same. Perhaps then we may have better insight into the entire early detection process for the early diagnosis of breast cancer.


    ACKNOWLEDGMENTS
 
Supported by Public Health Service grant nos. CA79466 and CA78607 from the National Cancer Institute, National Institute of Health, Department of Health and Human Services, Bethesda, MD, and the Cancer Research Foundation of America, Alexandria, VA.

We thank two referees for helpful comments on earlier drafts.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 PATIENTS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
1. Zelen M, Feinleib M: On the theory of screening for chronic diseases. Biometrika 56: 601-614, 1969[Abstract/Free Full Text]

2. Prorok PC: The theory of periodic screening: I. Lead time and proportion detected. Adv Appl Prob 8: 127-143, 1976

3. Prorok PC: The theory of periodic screening: II. Doubly bounded recurrence times and mean lead time and detection probability estimation. Adv Appl Prob 8: 460-476, 1976

4. Louis TA, Albert A, Heghinian S: Screening for the early detection of cancer: III. Estimation of disease natural history. Math Biosci 40: 111-144, 1978

5. Walter SD, Day NE: Estimation of the duration of a preclinical disease state using screening data. Am J Epidemiol 118: 856-886, 1983

6. Day NE, Walter SD: Simplified models of screening for chronic disease: Estimation procedures from mass screening programmes. Biometrics 40: 1-13, 1984[Medline]

7. Etzioni R, Shen Y: Estimating asymptomatic duration in cancer: The AIDS connection. Stat Med 16: 627-644, 1997[Medline]

8. Shen Y, Zelen M: Parametric estimation procedures for screening programmes: Stable and nonstable disease models for multimodality case finding. Biometrika 86: 503-515, 1999[Abstract/Free Full Text]

9. Shapiro S: Evidence on screening for breast cancer from a randomized trial, in Strax P (ed): Control of Breast Cancer Through Mass Screening. Littleton, MA, PSG Publishing, 1979, pp 19-36

10. Shapiro S, Venet W, Strax P, et al: Periodic Screening for Breast Cancer: The Health Insurance Plan Project and its Sequelae, 1963-1986. Baltimore, MD, The Johns Hopkins University Press, 1988

11. Tabar L, Akerlund E, Gad A: Five-year experience with single-view mammography randomized controlled screening in Sweden. Recent Results Cancer Res 90: 105-113, 1984[Medline]

12. Tabar L, Faberberg G, Day NE, et al: What is the optimum interval between mammographic screening examinations? An analysis based on the latest results of the Swedish two-county breast cancer screening trial. Br J Cancer 55: 547-551, 1987[Medline]

13. Tabar L, Gunnar F, Duffy SW, et al: Update of the Swedish two-county program of mammographic screening for breast cancer. Radiol Clin North Am 30: 187-210, 1992[Medline]

14. Andersson I: What can we learn from interval carcinomas? Recent Results Cancer Res 90: 161-163, 1984[Medline]

15. Andersson I, Aspegren K, Janzon L, et al: Mammographic screening and mortality from breast cancer: The Malmö mammographic screening trial. BMJ 297: 943-948, 1988

16. Frisell J, Glas U, Hellstrom L, et al: Randomized mammographic screening for breast cancer in Stockholm. Breast Cancer Res Treat 8: 45-54, 1986[Medline]

17. Frisell J, Eklund G, Hellstrom L, et al: Analysis of interval breast carcinomas in a randomized screening trial in Stockholm. Breast Cancer Res Treat 9: 219-225, 1987[Medline]

18. Frisell J, Eklund G, Hellstrom L: The Stockholm breast cancer screening trial: 5-year results and stage at discovery. Breast Cancer Res Treat 13: 79-87, 1989[Medline]

19. Bjurstam N, Bjorneld L, Duffy SW, et al: The Gothenburg breast screening trial: First results on mortality, incidence, and mode of detection for women ages 39-49 years at randomization. Cancer 80: 2091-2099, 1997[Medline]

20. First results on mortality reduction in the UK trial of early detection of breast cancer. UK Trial of Early Detection of Breast Cancer Group. Lancet 2: 411-416, 1988[Medline]

21. Roberts MM, Alexander FE, Anderson TJ, et al: Edinburgh trial of screening for breast cancer: Mortality at seven years. Lancet 335: 241-246, 1990[Medline]

22. Miller AB, Baines CJ, To T, et al: Canadian National Breast Screening Study: 1. Breast cancer detection and death rates among women aged 40 to 49 years. Can Med Assoc J 147: 1459-1476, 1992[Abstract]

23. Miller AB, Baines CJ, To T, et al: Canadian National Breast Screening Study: 2. Breast cancer detection and death rates among women aged 50 to 59 years. Can Med Assoc J 147: 1477-1488, 1992[Abstract]

24. Tabar L, Fagerberg G, Duffy SW, et al: Update of the Swedish two-county program of mammographic screening for breast cancer. Radiol Clin North Am 30: 187-210, 1992

25. Paci E, Duffy SW: Modeling the analysis of breast cancer screening programmes: Sensitivity, lead time, and predictive value in the Florence District programme (1975-1986). Int J Epidemiol 20: 852-858, 1991[Abstract/Free Full Text]

26. Duffy SW, Chen HH, Tabar L, et al: Estimation of mean sojourn time in breast cancer screening using a Markov chain model of both entry to and exit from the preclinical detectable phase. Stat Med 14: 1531-1543, 1995[Medline]

27. Tabar L, Fagerberg G, Chen HH, et al: Efficacy of breast cancer screening by age. Cancer 75: 2507-2517, 1995[Medline]

28. Chen HH, Duffy SW, Tabar L: A Markov chain method to estimate the tumour progression rate from preclinical to clinical phase, sensitivity and positive predictive value for mammography in breast cancer screening. Statistician 45: 307-317, 1996

29. Tabar L, Duffy SW, Vitak B, et al: The natural history of breast carcinoma. Cancer 86: 449-462, 1999[Medline]

30. Chen HH, Duffy SW, Tabar L: A mover-stayer mixture of Markov chain models for the assessment of dedifferentiation and tumour progression in breast cancer. J Appl Stat 24: 265-278, 1997

31. Berry DA: Benefits and risks of screening mammography for women in their forties: A statistical appraisal. J Natl Cancer Inst 90: 1431-1439, 1998[Free Full Text]

32. Sickles EA, Kopans DB: Deficiencies in the analysis of breast cancer screening data. J Natl Cancer Inst 85: 1621-1624, 1993[Free Full Text]

33. Gotzsche PC, Olsen O: Is screening for breast cancer with mammography justifiable? Lancet 355: 129-133, 2000[Medline]

34. de Koning HJ: Assessment of nationwide cancer-screening programmes. Lancet 355: 80-81, 2000[Medline]

35. Miller AB, Baines CJ, To T, et al: Screening mammography re-evaluated. Lancet 355: 747, 2000 (letter)[Medline]

36. Law M, Hackshaw A, Wald N: Screening mammography re-evaluated. Lancet 355: 749-750, 2000

37. Cates C, Senn S: Screening mammography re-evaluated. Lancet 355: 750, 2000 (letter)

38. Hayes C, Fitzpatrick P, Daly L, et al: Screening mammography re-evaluated. Lancet 355: 749, 2000 (letter)

39. Moss S, Blanks R, Quinn MJ: Screening mammography re-evaluated. Lancet 355: 748, 2000 (letter)

40. Hu P, Zelen M: Planning clinical trials to evaluate early detection programmes. Biometrika 84: 817-830, 1997[Abstract/Free Full Text]

41. Liberman L, LaTrenta LR, Samli B, et al: Overdiagnosis of medullary carcinoma: A mammographic-pathologic correlative study. Radiology 201: 443-446, 1996[Abstract/Free Full Text]

42. Peeters PHM, Verbeek ALM, Straatman H, et al: Evaluation of overdiagnosis of breast cancer in screening with mammography. Int J Epidemiol 18: 295-299, 1989[Abstract/Free Full Text]

43. Prorok PC, Kramer BS, Gohagan JK: Screening theory and study design: The basics, in Kramer P, Gohagan JK, Prorok PC (eds): Cancer Screening: Theory and Practice. New York, NY, Marcel Dekker, 1999, pp 29-54

Submitted August 11, 2000; accepted May 3, 2001.




This article has been cited by other articles:


Home page
Stat Methods Med ResHome page
J. D Mahnken, W. Chan, D. H Freeman Jr, and J. L Freeman
Reducing the effects of lead-time bias, length bias and over-detection in evaluating screening mammography: a censored bivariate data approach
Statistical Methods in Medical Research, December 1, 2008; 17(6): 643 - 663.
[Abstract] [PDF]


Home page
J Natl Cancer Inst MonogrHome page
S. Lee and M. Zelen
Chapter 11: A Stochastic Model for Predicting the Mortality of Breast Cancer
J Natl Cancer Inst Monographs, October 1, 2006; 2006(36): 79 - 86.
[Abstract] [Full Text] [PDF]


Home page
J Natl Cancer Inst MonogrHome page
K. A. Cronin, E. J. Feuer, L. D. Clarke, and S. K. Plevritis
Chapter 15: Impact of Adjuvant Therapy and Mammography on U.S. Mortality From 1975 to 2000: Comparison of Mortality Results From the CISNET Breast Cancer Base Case Analysis
J Natl Cancer Inst Monographs, October 1, 2006; 2006(36): 112 - 121.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
N. K. Stout, M. A. Rosenberg, A. Trentham-Dietz, M. A. Smith, S. M. Robinson, and D. G. Fryback
Retrospective cost-effectiveness analysis of screening mammography.
J Natl Cancer Inst, June 7, 2006; 98(11): 774 - 782.
[Abstract] [Full Text] [PDF]


Home page
Jpn J Clin OncolHome page
C. Hamashima, T. Sobue, Y. Muramatsu, H. Saito, N. Moriyama, and T. Kakizoe
Comparison of Observed and Expected Numbers of Detected Cancers in the Research Center for Cancer Prevention and Screening Program
Jpn. J. Clin. Oncol., May 1, 2006; 36(5): 301 - 308.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
Y. Shen, Y. Yang, L. Y. T. Inoue, M. F. Munsell, A. B. Miller, and D. A. Berry
Role of Detection Method in Predicting Breast Cancer Survival: Analysis of Randomized Screening Trials
J Natl Cancer Inst, August 17, 2005; 97(16): 1195 - 1203.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
J. G. Elmore, L. M. Reisch, M. B. Barton, W. E. Barlow, S. Rolnick, E. L. Harris, L. J. Herrinton, A. M. Geiger, R. K. Beverly, G. Hart, et al.
Efficacy of Breast Cancer Screening in the Community According to Risk Level
J Natl Cancer Inst, July 20, 2005; 97(14): 1035 - 1043.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
Y. Shen and G. Parmigiani
A Model-Based Comparison of Breast Cancer Screening Strategies: Mammograms and Clinical Breast Examinations
Cancer Epidemiol. Biomarkers Prev., February 1, 2005; 14(2): 529 - 532.
[Abstract] [Full Text] [PDF]


Home page
Stat Methods Med ResHome page
L G Hanin and A Y Yakovlev
Multivariate distributions of clinical covariates at the time of cancer detection
Statistical Methods in Medical Research, December 1, 2004; 13(6): 457 - 489.
[Abstract] [PDF]


Home page
Stat Methods Med ResHome page
R. Boer, S. Plevritis, and L. Clarke
Diversity of model approaches for breast cancer screening: a review of model assumptions by The Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups
Statistical Methods in Medical Research, December 1, 2004; 13(6): 525 - 538.
[Abstract] [PDF]


Home page
Am J EpidemiolHome page
S. A. Norman, A. R. Localio, L. Zhou, L. Bernstein, R. J. Coates, E. W. Flagg, P. A. Marchbanks, K. E. Malone, L. K. Weiss, N. C. Lee, et al.
Validation of Self-reported Screening Mammography Histories among Women with and without Breast Cancer
Am. J. Epidemiol., August 1, 2003; 158(3): 264 - 271.
[Abstract] [Full Text] [PDF]


Home page
ANN INTERN MEDHome page
L. L. Humphrey, M. Helfand, B. K.S. Chan, and S. H. Woolf
Breast Cancer Screening: A Summary of the Evidence for the U.S. Preventive Services Task Force
Ann Intern Med, September 3, 2002; 137(5_Part_1): 347 - 360.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal