180: Reinforcement Learning
                        
                        March 17, 2025, 3 p.m. (7 months, 2 weeks ago)
                    
                    
                        
                        0  Comments
                    
                Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
 - Recraft might be the most powerful AI image platform I’ve ever used — here’s why
 - NASA has a list of 10 rules for software development
 - AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
 
Book of the Show
- Patrick: 
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
 
 
 - The Player of Games (Ian M Banks)
 - Jason: 
- Basic Roleplaying Universal Game Engine
 
 
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick: 
- Pokemon Sword and Shield
 
 - Jason: 
- Features and Labels ( https://fal.ai )
 
 
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
 - Unsupervised Learning
 - Reinforcement Learning
 
 - Online vs Offline RL
 - Optimization algorithms
- Value optimization
- SARSA
 - Q-Learning
 
 - Policy optimization
- Policy Gradients
 - Actor-Critic
 - Proximal Policy Optimization
 
 
 - Value optimization
 - Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
 - Policy optimization is less intuitive at first (policy gradients)
 - Converting values to policies in deep learning is difficult
 
 - Imitation Learning
- Supervised policy learning
 - Often used to bootstrap reinforcement learning
 
 - Policy Evaluation
- Propensity scoring versus model-based
 
 - Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
 
 - Difficult optimization target
- Policy evaluation
 
 
 - Two optimization loops
 - RLHF & GRPO
 
                    No comments have been posted yet, be the first one to comment.