标签:Self-Play RL

万字长文解析OpenAI o1 Self-Play RL技术路线

OpenAI最近推出的Self-Play RL新模型o1在数理推理领域取得了显著成绩,并提出了train-time compute和test-time compute两个新的RL Scaling Law。o1是一个多模...