Training an AI model as described in this slide has a few notable potential issues that could impact model generalizability, fairness, and clinical utility:
1. Limited Institutional Diversity
- Hospitals A and B are the only sources of training data. This can lead to:
- Overfitting to site-specific biases, such as protocols, image acquisition techniques, labeling standards, or patient demographics.
- Poor generalization to data from other hospitals, especially those with different equipment, patient populations, or clinical practices.
2. Population Imbalance and Selection Bias
- 
Hospital A: CA = 2241, LVH controls = 604 
- 
Hospital B: General controls = 1265 
- 
CA (cardiac amyloidosis) is likely a rare condition; having 2241 CA cases suggests this may be a tertiary or referral center, not representative of the general population. 
- 
Hospital B’s “general controls” may differ significantly in health status or referral reason from controls in Hospital A. 
- 
If patients are not matched across institutions by age, sex, comorbidities, etc., the model may learn to distinguish hospitals rather than pathology. 
3. Label Leakage or Proxy Learning
- If the data source (Hospital A vs B) correlates strongly with the label (e.g., most CA from A, most controls from B), the model may exploit site-specific artifacts (e.g., ECG lead placement, pixel intensity patterns) instead of actual disease signals.
4. Commercially Available Data Considerations
- The slide states that this is a "Commercially Available" model trained on non-public datasets.
- If the training data is not publicly available, independent validation and reproducibility are hindered.
- Lack of transparency can limit trust and regulatory approval.
5. Control Group Definitions Are Inconsistent
- “LVH controls” and “general controls” are used, but it's unclear:
- Whether they are matched in demographics.
- Whether “LVH controls” are free from other cardiac conditions.
- This inconsistency can cause label noise or heterogeneity, especially problematic in nuanced clinical distinctions.
6. Sample Size Imbalance
- CA cases (2241) outnumber LVH controls (604), which could lead to:
- Class imbalance during training.
- A model that overly favors the dominant class or misrepresents performance on smaller groups.
Summary of Problems:
| Problem Type | Description | 
|---|---|
| Sampling Bias | Site-specific, unrepresentative data | 
| Label Leakage | Model might learn site ID instead of disease | 
| Control Inconsistency | LVH vs General not standardized | 
| Limited Generalizability | Unknown performance in new settings | 
| Transparency Risk | Commercial model trained on inaccessible data | 
