
| PPO (Proximal Policy Optimization) | GRPO (Group Relative Policy Optimization) |
|---|---|
| - Generalized Advantage Estimation / GAE (helps the AI to figure out which actions actually contributed to its success over time) | Group Computation (can handle different groups of agents or situations, each with its own kind of specialized strategy / analyzes awards differently for each group) |
| For careful and precision required tasks | For flexible and complex tasks |
| Ex: financial market algorithms, medical tasks | Ex: auto-driving cars |