Hands-on Modern RL - 现代强化学习实战指南 | OKHK 👀

Skip to main content

19:40 · 2026年5月7日 · 周四

Hands-on Modern RL - 现代强化学习实战指南
https://walkinglabs.github.io/hands-on-modern-rl
https://fxtwitter.com/sanbuphy/status/2052191088048558243
FxTwitter

Sanbu 散步 (@sanbuphy)

花了段时间写了 RL 教程 Hands-On Modern RL，路线是从 CartPole + PPO 入门，然后到 LLM 后训练（RLHF、DPO、GRPO）、Agentic RL。代码先行，公式用来解释现象。英文版很快更新。
目前是草稿版本，RLHF、Agentic RL 部分本地审校中。
欢迎提 PR 或 Issue & 显卡支持：https://github.com/walkinglabs/hands-on-modern-rl