An analysis of early studies released by the Lung Imaging Database Consortium (LIDC).

Ross JC, Miller J V, Turner WD, Kelliher TP. An analysis of early studies released by the Lung Imaging Database Consortium (LIDC). Acad Radiol. 2007;14(11):1382–8.

Abstract

RATIONALE AND OBJECTIVES: To analyze radiologist lung nodule segmentations in the Lung Imaging Database Consortium (LIDC) database and to apply statistical tools to generate estimates of ground truth. This investigation expands on earlier work by considering a larger number of cases from the LIDC database, and results were generated on a per-nodule basis, as opposed to a per-case basis as was done previously. MATERIALS AND METHODS: We analyzed nodule data drawn from the 41 most recent computed tomography exams released by the LIDC. We combined radiologist segmentations for a given nodule using different consensus schemes: union, intersection, and simultaneous truth and performance level estimation (STAPLE). We also generated three-dimensional models of the manual segmentations using discrete marching cubes to visualize features of the data. RESULTS: Using the union as the consensus scheme produced the greatest number of nodule-positive voxels while using the intersection produced the fewest. Considering only nodules for which all readers agreed on nodule presence, STAPLE computed sensitivity averages for readers one, two, three, and four were 0.91, 0.83, 0.90, and 0.77, respectively. Specificity averages were 0.97, 0.98, 0.97, and 0.97. Considering cases for which there was disagreement about nodule presence, sensitivity results become 0.67, 0.74, 0.60, and 0.37. Specificity results in this case are 0.95, 0.95, 0.95, and 0.98. STAPLE-generated pmaps exhibited probability values tightly grouped below the 0.25 and above the 0.75 probability levels. Three-dimensional models of manually segmented nodules revealed step-artefacts in the segmentation data. CONCLUSIONS: Radiologists often disagree about nodule presence. Ideally, knowing each reader’s sensitivity and specificity a priori is preferred for optimal STAPLE results. Knowing these values and developing manual segmentation tools and imaging protocols that mitigate unwanted segmentation features (such as step artefacts) can result in more accurate estimates of ground truth. Furthermore, a computer-aided detection algorithm’s performance is a function of the ground truth estimate by which it is scored.
Last updated on 02/26/2023