Pulmonary Embolism on CTPA: Where AI Triage Adds Value and Where It Doesn't

Visualization of pulmonary embolism CTPA AI detection performance analysis and honest AUC assessment

Pulmonary embolism detection on CT pulmonary angiography (CTPA) is one of the clearest use cases for AI-assisted triage in terms of basic computational feasibility: the finding has a discrete anatomical location, relatively consistent appearance (intraluminal filling defect in contrast-enhanced pulmonary arterial branches), and sufficient prevalence in the CTPA-ordered population to provide meaningful training and validation data. Published studies consistently report AUC values of 0.90–0.94 for central and lobar PE detection in retrospective series. That is a strong signal.

But PE on CTPA is also among the more nuanced indications for AI deployment because the clinical severity range of PE is so wide — from massive saddle embolus with right heart strain to incidental subsegmental PE discovered on a staging scan in a patient whose primary concern is a new lung nodule — and because CTPA image quality is highly variable in the community hospital setting. This article provides an honest assessment of where AI triage adds value in the PE detection workflow and where its limitations are most consequential.

The PE Detection Problem That AI Addresses Well

The strongest case for PE AI triage is the missed CTPA scenario in a high-volume overnight queue. CTPA studies are ordered for a broad range of clinical indications — suspected acute PE with Wells score and D-dimer context, staging CT in oncology patients where PE is incidental, post-operative surveillance, and increasingly as part of trauma or chest pain protocols where the ordering team has low pre-test probability but high clinical anxiety. In a community hospital overnight queue, a CTPA with central or lobar PE can sit behind twenty or thirty lower-acuity studies if arrival-time sorting is the only queue management mechanism.

AI triage addresses this by flagging the CTPA within seconds of PACS arrival — typically as part of a DICOM C-STORE workflow where the study is simultaneously routed to the PACS archive and to the AI inference engine — and moving it to a higher position in the worklist. For massive or submassive PE, where the clinical window for intervention (catheter-directed thrombolysis, anticoagulation initiation, hemodynamic support) is measured in hours, this advance in read timing can be clinically significant.

Published literature in Radiology and the European Respiratory Journal retrospective series shows that AI detection of central and lobar PE achieves AUC in the 0.90–0.94 range with sensitivities of 88–94% and specificities of 85–92% depending on study cohort and threshold. These numbers are for the finding category where AI performs best: large, central filling defects with good contrast bolus timing.

Where the Evidence Gets Thinner: Subsegmental and Incidental PE

Subsegmental PE — filling defects limited to subsegmental pulmonary arteries — is where AI detection performance declines substantially, and where the clinical management question is also most uncertain. Published studies report sensitivity for isolated subsegmental PE in the range of 60–80% depending on the system and cohort, with high variability related to subsegmental branch visibility, motion artifact, and the intrinsically lower lesion conspicuity of small subsegmental filling defects.

The clinical management of isolated subsegmental PE without evidence of deep vein thrombosis is itself debated in the published literature — guidelines from the American College of Chest Physicians and the European Society of Cardiology allow for monitoring without anticoagulation in low-risk patients with incidental subsegmental PE. This means that a false-negative AI result for a subsegmental filling defect may not translate into patient harm if the clinical team's management would have been observation anyway. The consequence of a miss is most severe for central and lobar PE, which is exactly where AI detection performs best.

We're not saying subsegmental PE doesn't matter — it does in patients with recurrent thromboembolism, active malignancy, or coexisting DVT. We're saying that the clinical risk stratification of PE mismatch is relevant to how you weight AI detection performance: the highest-risk disease is where the algorithm is strongest; the lower-risk borderline cases are where algorithm performance is weaker.

Artifact-Heavy Scans and the Community Hospital Scanner Reality

CTPA image quality is sensitive to a set of technical factors that vary considerably across community hospital CT environments: contrast bolus timing (poor timing produces suboptimal pulmonary arterial enhancement that mimics filling defects), respiratory motion (non-cooperative patients who can't breath-hold produce motion artifact that compromises vessel conspicuity), and scanner generation (older MDCT scanners produce higher noise levels that affect subsegmental branch visibility).

AI CTPA detection models trained predominantly on scanner data from academic centers with consistent protocols and modern equipment may generalize less well to the variable technical quality of community hospital CT volumes. A study acquired on a 16-slice MDCT scanner at a small community hospital, with suboptimal contrast timing because the IV line wasn't pre-flushed, will challenge both the radiologist and the AI model.

Published validation studies rarely provide sufficient detail about the scanner and protocol mix in the training and evaluation cohorts to allow community hospital buyers to assess this generalizability directly. The appropriate question to ask vendors is: what percentage of your training data came from community hospitals with pre-2018 scanner equipment, and do you have stratified performance data by scanner generation or protocol quality?

Right Heart Strain and Beyond-Detection Outputs

The most clinically valuable AI outputs for CTPA PE go beyond binary detection to include findings that stratify hemodynamic severity. Right heart strain — assessed by right-to-left ventricular diameter ratio on axial CT images — is a validated prognostic marker for PE severity that correlates with short-term mortality risk and guides decisions about advanced intervention versus anticoagulation alone.

Several cleared and investigational AI systems for CTPA PE include right heart strain assessment as a companion output. A DICOM Structured Report (SR) TID-1500 compliant object that reports both the PE detection confidence score and the RV/LV ratio provides more actionable information than a binary "PE detected" flag — it allows the triage system to differentiate a hemodynamically significant submassive PE from a low-acuity subsegmental finding, which has direct implications for worklist priority scoring.

Pacslens's PE triage component uses a combined filling-defect detection and right heart assessment approach to generate PE criticality scores rather than binary flags. Studies with both filling defect detection and elevated RV/LV ratio receive higher triage scores than studies with suspected filling defects alone. This multi-signal scoring is more clinically meaningful for worklist prioritization than single-signal detection.

What This Means for CTPA AI Deployment Decisions

The AUC 0.90–0.94 headline for PE detection is accurate for the right clinical context: contrast-enhanced CTPA with central or lobar PE in a high-quality scan. Community hospitals should expect that performance on their actual CTPA volume — which includes suboptimal technical quality and a mix of clinical indications with varying PE prevalence — will differ from published benchmarks.

The appropriate deployment posture for PE triage AI in a community hospital is as a worklist prioritization tool that catches high-probability central and lobar PE cases and moves them earlier in the read order — not as a diagnostic decision support tool that validates the radiologist's interpretation. The latter requires cleared device status and a device description that supports that use case; the former is a workflow enhancement that operates in a pre-read context without replacing radiologist judgment.

For pilot evaluation, track: (1) how often the AI-flagged studies in your CTPA volume are subsequently confirmed as PE by radiologist read; (2) the distribution of triage scores for PE-confirmed versus PE-negative studies; and (3) whether high-score studies are being read before low-score studies in the overnight queue. These three metrics together tell you whether the system is calibrated appropriately for your environment.

Want to discuss how PE triage scoring integrates with your CTPA workflow and overnight study volume? Request a demo or reach our clinical team directly.