|
|||||
|
|
||||||
© 2000 American Society for Clinical Oncology
Evaluation of Tumor Measurements in Oncology: Use of Film-Based and Electronic TechniquesFrom the Departments of Radiology and Medical Physics, Memorial Sloan-Kettering Cancer Center, and Weill Medical College at Cornell University, New York, NY, and Bioimaging Technologies, West Trenton, NJ. Address reprint requests to Lawrence H. Schwartz, MD, Department of Radiology, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, New York, NY 10021-6007; email schwartz{at}msucc.org
PURPOSE: To evaluate the variability in bidimensional computed tomography (CT) measurements obtained of actual tumors and of tumor phantoms by use of three measurement techniques: hand-held calipers on film, electronic calipers on a workstation, and an autocontour technique on a workstation. MATERIALS AND METHODS: Three radiologists measured 45 actual tumors (in the lung, liver, and lymph nodes) on CT images, using each of the three techniques. Bidimensional measurements were recorded, and their cross-products calculated. The coefficient of variation was calculated to assess interobserver variability. CT images of 48 phantoms were measured by three radiologists with each of the techniques. In addition to the coefficient of variation, the differences between the cross-product measurements of tumor phantoms themselves and the measurements obtained with each of the techniques were calculated. RESULTS: The differences between the coefficients of variation were statistically significantly different for the autocontour technique, compared with the other techniques, both for actual tumors and for tumor phantoms. There was no statistically significant difference in the coefficient of variation between measurements obtained with hand-held calipers and electronic calipers. The cross-products for tumor phantoms were 12% less than the actual cross-product when calipers on film were used, 11% less using electronic calipers, and 1% greater using the autocontour technique. CONCLUSION: Tumor size is obtained more accurately and consistently between readers using an automated autocontour technique than between those using hand-held or electronic calipers. This finding has substantial implications for monitoring tumor therapy in an individual patient, as well as for evaluating the effectiveness of new therapies under development.
THE RESPONSE OF tumors to chemotherapy and radiotherapy is commonly assessed by obtaining tumor measurements on radiologic images. Typically, such measurements consist of the cross-product of the two largest perpendicular diameters evident on cross-sectional images (such as ultrasound, computed tomography [CT], or magnetic resonance [MR] imaging).1-3 Various interpretations of the resultant measurements have been proposed to define tumor response (or lack thereof) to a given therapy. Currently, the most commonly used criteria are those of the Eastern Cooperative Oncology Group.1 For measurable lesions (ie, those with sharply defined borders on radiologic images), a complete response is considered to have been obtained when the tumor has completely disappeared on posttherapy images; a partial response is considered to have been obtained when the bidimensional cross-product has decreased at least 50% for at least 4 weeks; and disease progression is considered to have occurred when the cross-product of any tumor larger than 2 cm2 in diameter increases 25% or more. New United States Food and Drug Administration initiatives allow for shorter approval times for cancer therapies in some patients by recognizing that tumor shrinkage, as measured on imaging studies, is often an indication of a therapys potential effectiveness; in the past, demonstration of increased survival time or quality of life was required before Food and Drug Administration approval for marketing could be obtained (www.fda.gov/opacom/backgrounders/cancerbg.html). Bidimensional measurements on radiographic (x-ray) film are usually performed manually with hand-held calipers (for cross-sectional images) or a standard ruler (for radiographs). However, intra- and interobserver variability is relatively high,3-8 presumably because of the subjectivity involved in defining the exact margins of a lesion and because determination of the lesions largest diameter and its largest perpendicular diameter is not always intuitive.9 Such variability can have profound effects on the assessment of an individual patients tumor response to a given therapy, as well as on the determination of efficacy of a new antitumor therapy.4,6,8 The purpose of this study was to determine whether electronic measurements and a semiautomated (computerized) technique of tumor measurement could result in more accurate and precise measurements with less observer variability.
Tumor Measurements on Clinical CT Images Clinical CT images of 45 tumor masses located in three sites (the lung, liver, and lymph nodes; n = 15 of each) were selected from 23 patients with a variety of proven cancers. Specified images of the 45 tumors were measured by each of three radiologists using three measurement techniques, each performed three times at separate sittings. First, hand-held calipers (as are used for measurements on ECGs) were used to measure the largest diameter of a tumor and its largest perpendicular diameter on standard radiographic film (15-on-1 format); the measurement scale on the images was used to calibrate the calipers. The images had been photographed with window and level settings commonly used in clinical practice. The electronic caliper tool, which allows the user to draw a thin electronic line on the computer monitor, was then used to measure the largest perpendicular diameters of a tumor on soft-copy images (a picture archiving and communications system [PACS] workstation) (Fig 1). Images could be magnified and window/level settings adjusted at the radiologists discretion. Last, a custom, proprietary, automated autocontour ("shrink wrap") technique was used to measure the tumor on soft copy. After the radiologist drew an approximate outline around the lesion by using the mouse, the computer automatically adjusted the contour on the basis of density differences between the tumor and surrounding normal tissues (Fig 2). Alternatively, after the radiologist placed a cursor in the center of the lesion, the computer determined the borders of the lesions, again on the basis of density differences. Manual editing of the resultant borders can be performed with either type of autocontour technique. From the final contour, the computer automatically displayed the maximum perpendicular diameters, as well as the area of the lesion (Fig 2C).
The measurements with each technique were obtained at different times, separated by at least 2 weeks to minimize recall bias; the order in which each tumor was measured with the three techniques was rotated among the three readers as well. Bidimensional measurements were recorded, and their cross-products were calculated.
Tumor Phantom Measurements
The largest perpendicular diameters of each tumor phantom were obtained with precision measuring tools applied to the tumors immediately after scanning. The measurements were made in the same axial plane of the tumor phantom as that used during CT scanning. Three radiologists (two of whom had also measured tumors on the clinical images) each independently measured all 48 tumor phantoms once, using each of the three measurement techniques at separate sittings. The radiologists were blinded to the actual tumor-phantom dimensions. Bidimensional measurements were recorded, and their cross-products were calculated.
Statistical Analysis
Intraobserver variability was assessed for each of the measurement techniques for the actual tumors, using the formula
The mean sizes of the 45 actual tumors in the lung, liver, and lymph nodes measured on CT images are listed in Table 1. These lesions were not resected from the patients and thus could not be measured at pathologic examination.
For both the actual tumors and the tumor phantoms, the interobserver variability (assessed by coefficients of variation) for hand-held caliper measurements and for electronic caliper measurements was similar and were statistically different, compared with that for the autocontour technique (Table 2) (P < .05). There was no statistically significant difference in the coefficient of variation for measurements obtained with the hand-held calipers and with the electronic calipers. The interobserver variability ranged from 0.19 for measurement of the actual tumors with hand-held calipers to 0.05 for tumor phantoms measured with the autocontour technique.
For the tumor phantoms, the cross-products, using calipers on film, were 12% less than the cross-products obtained with precision measuring tools applied directly to the tumor phantoms immediately after being scanned, 11% less using electronic calipers, and 1% greater using the autocontour technique. For two of the three radiologists (readers 2 and 3), the intraobserver variability in measuring the actual tumors was statistically significantly less for the autocontour technique than for the hand-held caliper technique (Table 3). Intraobserver variability was not measured for the tumor phantoms because they were measured only once by each observer.
Radiologic images provide critical information about changes in tumor size on serial examinations performed before, during, and after chemotherapy or radiation-therapy regimens. Such an assessment cannot reliably be obtained from physical examination in most cases, yet is essential for determining whether or not the particular therapy is benefiting the patient. The Eastern Cooperative Oncology Group criterion for tumor response is a 50% or greater reduction in the sum of bidimensional products of index lesions, whereas a 25% or greater increase in the sum of bidimensional products indicates tumor progression.1 However, small errors in tumor measurement can lead to serious disagreements in classification of tumor response. Such disagreements are particularly serious when they cause an ineffective investigational therapy to be classified as effective (or the converse situation). In one study, major discrepancies in patient tumor response assessments were found in 40% of reviewed cases from a large, multicenter trial of cytokine therapy in metastatic renal cell cancer; the major cause of these discrepancies was the differences in tumor measurements.8 For decades, plain radiographs were the only type of radiologic images available; tumor-size measurements obtained from them were limited by the ambiguities produced by the summation of overlapping and contiguous structures in the body region being imaged, as well as because those images were two-dimensional displays of three-dimensional objects. For example, it could be difficult to determine the margins of a pulmonary nodule located near the hilum on a chest radiograph; accurate measurement of mediastinal adenopathy on chest radiographs is generally not even possible. Cross-sectional imaging techniques, such as ultrasound, CT, and MR imaging, allow more accurate tumor measurements to be made because the tumor can be more readily separated visually from other adjacent and overlying structures, both by the lack of superimposition of structures and the superior ability of these techniques to distinguish various types of soft tissues (such as fat and muscle). Although the definition of a tumor outline on cross-sectional images is affected by the partial volume averaging effect (which is minimized by producing thinner sections), this drawback is more than offset by the images lack of superimposition of other structures. Moreover, these inherently digital imaging techniques offer the ability to perform measurements electronically and with semiautomated techniques that could potentially lead to more precise, accurate, and reproducible measurements. Semiautomated techniques can minimize or eliminate the subjectivity inherent in the determination of lesion borders when hand-held or electronic calipers are used.10,11 Despite the importance of tumor measurements, it is known that considerable variability exists in obtaining them.3-9,11 Several approaches to measurement have been advocated, including obtaining the maximum diameter,12 bidimensional product,1-3,7,8 tridimensional product,13 or helical CT volume14,15 and semiautomated edge delineation to obtain tumor slice areas.11 The bidimensional product, obtained by multiplying the longest tumor diameter by its longest perpendicular diameter, remains the most common measurement technique used at most institutions, despite the known inaccuracies in determining these diameters.9 The longest diameter of a tumor and the longest perpendicular diameter are not always intuitively obvious from visual inspection of the image. A common error, in our experience, is that two nonperpendicular diameters are chosen; also, the diameters are sometimes selected to parallel the sides of the film or image (possibly to facilitate reproducibility of the measurements on follow-up examinations), rather than to obtain the largest perpendicular diameters (Fig 4).
The automated autocontour technique used in our study allows for a calculation of every diameter of the lesion before the largest is chosen and subsequently allows every diameter perpendicular to the longest diameter to be drawn before the largest is chosen. In this manner, subjectivity in choosing the diameters is eliminated. Also, when an automated shrink-wrap technique based on density differences between the tumor and its surrounding tissues is used, the area of the tumor on the image section is obtained rapidly and with less error than that introduced by manual tracing. The time needed to measure a lesion with the autocontour technique is approximately the same as that required with electronic calipers or with hand-held calipers. Two of the three radiologists in our study had statistically significantly less intraobserver variation when measuring actual tumors with the autocontour technique than with hand-held calipers, indicating that the consistency of tumor measurement can be improved for at least some individuals. No statistically significant difference in interobserver variability was demonstrated for any of the three readers using hand-held calipers versus electronic calipers. This finding is not surprising, given that both those measurement techniques have inherent subjectivity in the determination of both the margins of the lesion and the largest perpendicular diameters. Intraobserver variability was also the lowest for the autocontour technique, although this difference was statistically significantly different only compared with hand-held caliper measurements. When measuring actual tumors, the radiologist often needs to manually edit the margins of the tumor (such as at the point where the tumor abuts a normal structure of similar density), thus introducing some subjectivity into the autocontour technique. Nevertheless, a given radiologist likely makes similar subjective decisions for the same tumor each time it is measured, resulting in lower intraobserver variability. A major current trend in medical practice is radiologic images being interpreted on soft copy (ie, monitors) rather than hard copy (ie, radiographic film). PACS represents the ultimate goal of "filmless radiology," with the resultant increase in the availability of previous radiologic images and the ability to electronically manipulate images to improve their interpretation. Many medical institutions worldwide have implemented PACS into at least some subset of their radiology departments. Thus it should become ever increasingly possible to replicate in routine clinical practice the superior results obtained with our autocontour measurement technique on a PACS workstation. One issue not addressed in our study is that in the presence of multiple tumors on an imaging examination, more lesions are present than can practically be measured; importantly, however, the same index lesions are not reproducibly selected for measurement by different radiologists on serial examinations.7 Of 116 tumor deposits that were evident on thoracoabdominopelvic CT images of 24 patients, only 27 deposits (23%) were selected as indicator lesions by all three radiologists in one reported study7; 57 deposits (49%) were selected by only one of the three observers. The interobserver and intraobserver variabilities in measurements of tumor size in three dimensions on these CT images were 15% and 6%, respectively. If different lesions are measured each time, the usefulness of the formal radiologic reports of those examinations is limited. Given that different tumor deposits in the same patient may show different rates of growth, different responses to therapy, or both, the selection of which lesions are measured has obvious implications in an assessment of the patients disease status. Various factors, such as the regularity of lesion contours and lesion conspicuity, affect which lesions may be chosen by the radiologist.7 Other factors affecting measurement include partial-volume averaging, patient motion, slice misregistration, and differences in the timing of iodinated intravenous contrast administration. The critical issue of determining how to choose those lesions that are most appropriate to measure in a given patient is beyond the scope of our study. In an attempt to minimize interobserver differences in tumor lesion selection and measurement and to improve the accessibility of radiologic information, investigators at the University of California, Los Angeles, designed and implemented an integrated multimedia timeline of medical images and data for thoracic oncology patients.16,17 Images from serial radiologic examinations could be annotated with electronic measurements, allowing rapid assessment of serial tumor size and facilitating subsequent measurements of the same lesions in the same axes. This novel database is an example of the way in which data residing in PACS can be extracted and formatted to allow better assessments of tumor response. The composition of a tumor likely reflects the biologic status of the tumor more accurately than do the bidimensional tumor measurements.1 For example, a tumor that has become largely necrotic may enlarge rapidly and impressively because of internal hemorrhage18; consideration of tumor measurements alone could lead to the incorrect conclusion that the disease had progressed. Most imaging examinations to date have produced two-dimensional images. Recent improvements in imaging techniques and computer hardware have allowed three-dimensional images to be displayed with ultrasound, CT, and MR imaging. Further studies are needed to determine whether tumor measurements obtained from three-dimensional images provide significantly better results than do those from two-dimensional images. Measuring tumor phantoms was a less difficult task than measuring actual tumors in our study, given the lack of adjacent tissues with densities similar to those of the tumors in the phantoms we constructed. Nevertheless, our findings in measurements of actual tumors and of tumor phantoms were similar. Similarly, readers found it easier to measure lung tumors than retroperitoneal lymph nodes, and there was less variability in the lung measurements; however, this did not reach statistical significance, likely at least partly because of the small sample size. Because the selected tumors in our study were not resected, we could not assess the accuracy of the three measurement techniques for those lesions. However, even if pathologic specimens had been available, changes in lesion size due to shrinkage of pathologic specimens and changes in lesion shape due to lack of surrounding tissues after resection would result in different measurements than would those obtained from the images. We were able to attain our goal of comparing variability in measurements with the three techniques; direct measurements of the tumor phantoms also allowed us to validate the accuracy of the autocontour technique. In summary, accurate assessment of tumor size on radiologic images has critical implications for both the appropriate care of an individual patient and the correct assessment of the effectiveness of a particular therapy. Tumor size, expressed as cross-products, is significantly more consistent among readers using an semiautomated autocontour technique than among those using either hand-held or electronic calipers.
We are grateful to Ronald A. Castellino, MD, for insightful comments on this project and to Hyok-Hee Yoo, BS, for assistance in data management.
1. Oken MM, Creech RH, Tormey DC, et al: Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol 5:649-655, 1982[Medline] 2. Miller AB, Hoogstraten B, Staquet M, et al: Reporting results of cancer treatment. Cancer 47:207-214, 1981[Medline] 3. Watson JV: What does "response" in cancer chemotherapy really mean? BMJ 283:34-37, 1981 4. Warr D, McKinney S, Tannock I: Influence of measurement error on assessment of response to anticancer chemotherapy: Proposal for new criteria of tumor response. J Clin Oncol 2:1040-1046, 1984[Abstract] 5. Quoix E, Wolkove N, Hanley J, et al: Problems in radiographic estimation of response to chemotherapy and radiotherapy in small cell lung cancer. Cancer 62:489-493, 1988[Medline] 6. Lavin PT, Flowerdew G: Studies in variation associated with the measurement of solid tumors. Cancer 46:1286-1290, 1980[Medline]
7.
Hopper KD, Kasales CJ, Van Slyke MA, et al: Analysis of interobserver and intraobserver variability in CT tumor measurements. AJR Am J Roentgenol 167:851-854, 1996
8.
Thiesse P, Ollivier L, Di Stefano-Louineau D, et al: Response rate accuracy in oncology trials: Reasons for interobserver variability. J Clin Oncol 15:3507-3514, 1997
9.
Fornage BD: Measuring masses on cross-sectional images. Radiology 187:289, 1993 (letter) 10. Chaney EL, Pizer SM: Defining anatomical structures from medical images. Semin Radiat Oncol 2:215-225, 1992[Medline] 11. Bellon E, Feron M, Maes F, et al: Evaluation of manual vs semi-automated delineation of liver lesions on CT images. Eur Radiol 7:432-438, 1997[Medline] 12. Gurland J, Johnson RO: Case for using only maximum diameter in measuring tumors. Cancer Chemother Rep 50:119-124, 1966[Medline] 13. Spears CP: Volume doubling measurements of spherical and ellipsoidal tumors. Med Pediatr Oncol 12:212-217, 1984[Medline] 14. Hopper KD, Kasales CJ, Eggli KD, et al: The impact of 2D versus 3D quantitation of tumor bulk determination on current methods of assessing response to treatment. J Comput Assist Tomogr 20:930-937, 1996[Medline] 15. Nawaratne S, Fabiny R, Brien JE, et al: Accuracy of volume measurement using helical CT. J Comput Assist Tomogr 21:481-486, 1997[Medline] 16. Aberle DR, Dionisio JD, McNitt-Gray MF, et al: Integrated multimedia timeline of medical images and data for thoracic oncology patients. Radiographics 16:669-681, 1996[Abstract] 17. Bui AA, Aberle DR, McNitt-Gray MF, et al: The evolution of an integrated timeline for oncology patient healthcare. Proc Am Med Inf Assoc Symp, 1998, pp 165-169
18.
Panicek DM, Casper ES, Brennan MF, et al: Hemorrhage simulating tumor growth in malignant fibrous histiocytoma at MR imaging. Radiology 181:398-400, 1991 Submitted September 29, 1999; accepted January 26, 2000. This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2000 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|