Reinforcement Learning

GRPO VS PPO

Screenshot 2025-07-17 at 4.47.01 PM.png

PPO (Proximal Policy Optimization) GRPO (Group Relative Policy Optimization)
- Generalized Advantage Estimation / GAE (helps the AI to figure out which actions actually contributed to its success over time) Group Computation (can handle different groups of agents or situations, each with its own kind of specialized strategy / analyzes awards differently for each group)
For careful and precision required tasks For flexible and complex tasks
Ex: financial market algorithms, medical tasks Ex: auto-driving cars

Task 2: Multi‑Abnormality Classification

Phase 1 (ResNet3D)

Phase 2 (CT-CLIP)

Trials:

  1. ResNet3D with BCELoss
    1. Epoch 2 Summary | Train Loss: 0.3383 | Val Loss: 0.3199 | AUROC: 0.8657 | Accuracy: 0.1503 | Sensitivity: 0.4198 | Specificity: 0.9541
  2. ResNet3D with FocalLoss and class weights
  3. CT-CLIP pre-trained weights (freeze encoders) with FocalLoss and class weights
    1. Epoch 10 Summary | Average Training Loss: 0.2971 Validation AUROC: 0.5869 | Accuracy: 0.0000 | Sensitivity: 0.5385 | Specificity: 0.5745
      1. Needs improved accuracy
  4. CT-CLIP pre-trained weights (freeze encoders) with FocalLoss
    1. Epoch 1 Summary | Average Training Loss: 0.1198 Validation AUROC: 0.6650 | Accuracy: 0.0969 | Sensitivity: 0.1429 | Specificity: 0.9620
    2. Epoch 9 Summary | Average Training Loss: 0.1176 Validation AUROC: 0.6978 | Accuracy: 0.0612 | Sensitivity: 0.0989 | Specificity: 0.9678
    3. Epoch 10 Summary | Average Training Loss: 0.1173 Validation AUROC: 0.6836 | Accuracy: 0.0867 | Sensitivity: 0.0251 | Specificity: 0.9938
  5. CT-CLIP pre-trained weights (freeze encoders) with BCELoss
    1. Epoch 2 Summary | Average Training Loss: 0.4419 Validation AUROC: 0.6819 | Accuracy: 0.1071 | Sensitivity: 0.0000 | Specificity: 1.0000
  6. CT-CLIP pre-trained weights (unfreeze encoders) with BCELoss and scheduler (ReduceLROnPlatea)
    1. Epoch 1 Summary | Avg Train Loss: 0.4577 | Avg Val Loss: 0.4447 | Val AUROC: 0.6777 | Accuracy: 0.1071 | Sensitivity: 0.0000 | Specificity: nan
    2. Epoch 5 Summary | Avg Train Loss: 0.3386 | Avg Val Loss: 0.4253 | Val AUROC: 0.7575 | Accuracy: 0.0918 | Sensitivity: 0.1962 | Specificity: 0.9602
  7. CT-CLIP pre-trained weights (unfreeze encoders) with BCELoss and class weights and scheduler (ReduceLROnPlatea)