AI Assistance Improves Lung Nodule Detection: Radiologist Experience Matters

AI Assistance Improves Lung Nodule Detection: Radiologist Experience Matters

Evaluating the Impact of AI-Assisted Lung Nodule Detection on Radiologists with Varying Experience Levels

This study aimed to assess the performance of two distinct deep learning (DL)-based computer-aided detection (CAD) systems for lung nodule detection and classification. The study utilized a large clinical cohort in a second-reader setup, with a specific focus on evaluating the impact of AI assistance on radiologists with varying levels of experience.

The study adhered to ethical guidelines, obtaining informed consent from all participants and receiving approval from the local ethics committee.

Primary Cohort

The retrospective study involved a primary cohort of 205 subjects with 228 index nodules: 66 subsolid and 162 solid nodules. The cohort was expanded to a sufficient size by reviewing chest CT scans acquired over the last two years at the local institution. This expansion included 53 patients with 68 nodules (17 subsolid and 51 solid).

The cohort was evenly distributed across Lung imaging reporting and data system (LungRADS) scores, with 53% of nodules classified as LungRADS 2/3 and 47% as LungRADS 4 A/4B. The minimal nodule size was 3 mm.

A control group of 30 patients without any pulmonary nodules was also included.

CT Scan Protocol

CT examinations originated from over 20 institutions across the country, with 4 different CT vendors. All examinations were performed with standard settings and reconstruction parameters aligned with national guidelines. While the acquired minimum slice thicknesses varied from 0.5 to 4 mm, 94% of examinations had a minimum slice thickness of ≤ 2 mm and 70% had a thickness of ≤ 1 mm.

Reconstruction algorithms included filtered-back projection and iterative reconstruction (IR). Scan volumes varied based on clinical symptoms and indications. Out of the 205 CT scans, 137 (67%) were contrast-enhanced.

DL-Based CAD Systems

The study utilized two distinct commercially available DL-based CAD systems:

* **Software 1:** An advanced machine learning software capable of lung segmentation, nodule size and density measurement. It isolates pulmonary nodules from vessels, bronchi, and fissures, creating a new CT dataset highlighting pulmonary consolidations and nodules (ClearRead-CT, Riverain Technologies).
* **Software 2:** A fully automated artificial intelligence (AI) convolutional neural network (CNN) consisting of a multi-layered machine learning algorithm (AI-Rad Companion, Siemens Healthineers).

Both systems have undergone extensive validation in numerous prior studies.

Readout

Five radiologists with varying experience levels (two board-certified, subspecialized chest radiologists and three trainees) evaluated all CT scans independently and blinded to the number of nodules. Two readers used Software 1, while three readers used Software 2.

The readout was completed in two steps:

1. **Independent Readout:** Each rater independently read all CT scans and documented the number, size group, density group, and location of all pulmonary nodules on a standardized form.
2. **AI-Aided Readout:** After a one-month hiatus, each rater performed a second readout with an overlay of the AI findings in the DICOM data. Readers were not allowed to alter the results from the initial independent readout.

Standard of Reference

Two board-certified radiologists with extensive experience in chest radiology established the standard of reference independently, reaching consensus in cases of disagreement. The LungRADS category of each nodule was determined based on its size and density following LungRADS v2022.

Nodule Detection Rate

The nodule detection rate represented the proportion of correctly identified nodules on a scan-level. A nodule was considered correctly detected if the reported lobe and density matched the actual nodule characteristics.

Statistical Analysis

Statistical analyses were conducted using SPSS Statistics version 26.0 and GraphPad Prism version 8. McNemar tests were used for pairwise comparisons to assess the impact of AI software use at different experience levels. Fleiss kappa was used to assess interreader agreement. A multilevel mixed-effects logistic regression model was also performed to assess the interaction of AI software use and experience regarding the detection rate.

Newsletter

Get Newsl top blog posts by email