Notification Fatigue in Radiology AI: Why Alert Volume Matters More Than Sensitivity

Abstract visualization of alert volume and notification fatigue in radiology AI workflow

There is a clinical analogy that radiologists who have worked in academic centers with early-generation AI tools understand intuitively: the nursing call bell that rings so often that everyone stops responding to it. The principle behind call bell fatigue is the same principle that makes over-alerting radiology AI worse than no alerting at all. When the system fires on anything that might be relevant, it stops being useful as a signal and becomes ambient noise.

Alert fatigue in radiology AI is not hypothetical. It has been documented in published literature examining clinical decision support systems more broadly, and it is a recurring topic in JACR (Journal of the American College of Radiology) discussions on AI deployment. The mechanism is straightforward: high-sensitivity thresholds reduce miss rates but increase false positive alert volume. Radiologists adapt — consciously or not — by treating alerts as background noise. At that point, the alert has zero clinical utility.

Sensitivity Is Not the Right Optimization Target for Deployed Systems

Academic benchmarks for radiology AI models are usually reported as sensitivity/specificity pairs, often at a single threshold selected to maximize AUC. These numbers are important for understanding model capability and for FDA 510(k) device characterization. But the deployment-phase question is different from the evaluation-phase question.

In deployment, the model doesn't operate in isolation. It operates within a clinical workflow where a radiologist is making decisions — decisions that include deciding whether to attend to each alert the system generates. The operationally relevant metric is not "what fraction of true LVO studies does the model flag?" but "what fraction of alerts the model generates actually correspond to a study requiring urgent action?" These are precision and recall framed at the workflow level, not the finding level.

A model with 95% sensitivity for LVO at a threshold that generates 40 alerts per 100 overnight CTA-head studies, 38 of which are ultimately read as normal, has a workflow-level positive predictive value of roughly 5%. The radiologist who receives 38 false alerts for every 2 true positives will, within a few weeks of deployment, stop treating those alerts as signals requiring immediate action. This is not a failure of the radiologist. It is a rational adaptive response to a miscalibrated information system.

The Alert Volume Problem in Community Hospitals

Community hospitals face an aggravated version of this problem because there is typically no dedicated AI operations team to monitor and recalibrate threshold settings post-deployment. In academic centers, a radiology AI governance committee or an IT team with clinical informatics support can track alert precision metrics, adjust thresholds, and intervene when volume spikes. Community hospitals — running overnight with one radiologist and a small IT team covering the entire hospital's systems — have no such infrastructure.

Consider a plausible scenario: a 200-bed community hospital in northern New England deploys an ICH detection tool from an early-stage vendor at default thresholds. Within the first 60 days, the tool generates alerts on 22% of overnight head CT studies. The overnight radiologist, covering an average of 45 head CTs per shift, receives roughly 10 alerts per night. Over six weeks of this pattern, the radiologist stops routing to the alert dashboard first. The ICH tool is still running, still generating alerts, still charging the hospital its per-alert or per-month fee — but it has been behaviorally disabled by its own over-alerting.

We're not saying high sensitivity is inherently bad. We're saying that deploying a high-sensitivity model without calibrating its alert volume to the workflow tolerance of the target environment is a predictable path to clinical non-use.

Worklist Re-Ranking as an Alert Volume Management Strategy

Worklist re-ranking approaches the problem differently. Instead of generating a separate alert — a push notification, a phone call, a pager event — for each flagged study, the system communicates priority through the worklist ordering itself. The radiologist opens their PACS viewer or RIS worklist and sees that certain studies have been moved toward the top, annotated with STAT or URGENT triage indicators. No additional notification channel is introduced. The signal is the queue position.

This approach changes the information-to-action ratio. The radiologist doesn't need to evaluate whether to respond to an alert — they're already looking at the worklist. The presence of a high-priority study at the top of the queue is not an interruption; it is a presentation of what they were going to look at anyway, in a more useful order. The cognitive load of alert response is replaced by the zero-overhead act of reading the first study on the list.

The tradeoff is that worklist re-ranking is a softer intervention than an alert. An alert can interrupt the radiologist mid-read and prompt action on a study they haven't reached yet. A re-ranked worklist assumes the radiologist is working through the queue sequentially — which is the common overnight workflow but is not universal. If the radiologist is cherry-picking studies by modality or if a critical study arrives while they're actively reading something else, worklist position may not provide sufficient urgency signaling for the most time-critical findings.

Pacslens addresses this by maintaining a configurable escalation threshold: studies scored above a configurable criticality ceiling generate a supplemental notification rather than relying on queue position alone. But the default for most community hospital deployments is worklist re-ranking without alert fire, because that default produces a lower-interruption workflow that sustains radiologist engagement over time.

Measuring Alert Fatigue in Practice

Quantifying alert fatigue is harder than it sounds because the effect manifests as behavior change, not system failure. A system can report 100% alert delivery and 0% radiologist response and the monitoring dashboard will show everything working nominally. The metric that actually captures fatigue is alert-to-action ratio: for each alert delivered, does the radiologist's reading behavior change in the expected direction within a clinically relevant timeframe?

Published literature on clinical decision support alert fatigue — primarily from the medication ordering context (CPOE systems in Epic and Cerner environments) but increasingly from radiology AI deployment analyses in JACR — suggests that alert override rates above 90% are a reliable indicator of fatigue-driven non-use. For radiology AI, the analogous signal is alert acknowledgment without worklist reordering: the radiologist clicks "acknowledged" to clear the notification but continues reading in arrival-time order.

Hospital AI procurement decisions should request this metric from any radiology AI vendor: what is the alert-to-behavioral-change conversion rate in deployed environments, measured at 30, 60, and 90 days post-go-live? If the vendor can't produce this number, it's a meaningful signal about whether they're tracking deployment outcomes in the way a clinically responsible vendor should.

Precision as a First-Class Design Requirement

The right design philosophy for radiology AI deployment, from Pacslens's perspective, is to treat workflow-level alert precision as a first-class design requirement alongside model sensitivity. Sensitivity is about finding the cases you shouldn't miss. Precision is about not crying wolf on the cases that don't require urgent action. Both matter in a deployed clinical system. In academic settings with subspecialty coverage and AI governance infrastructure, you can sacrifice precision for sensitivity and manage the resulting alert volume manually. In community hospital overnight settings, you cannot — and designing for that reality is not a concession, it's a requirement.

The published evidence on ICH detection, LVO detection, and PE detection consistently shows that models can achieve high sensitivity at the cost of specificity. That's inherent to the problem — critical findings are rare events in a general CT population, and finding them all requires a threshold low enough to catch borderline presentations, which generates false positives. The design question is not whether to tolerate false positives, but where to absorb them: in the alert delivery system (which creates fatigue) or in the worklist ranking (which creates a gentler signal that preserves behavioral response).

If you're evaluating radiology AI tools and want to understand how Pacslens manages alert volume in community hospital deployments, request a demo walkthrough that includes a discussion of threshold configuration and escalation settings for your study volume.