<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>Flow-GRPO: Training Flow Matching Models via Online RL</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2505.05470">https://arxiv.org/abs/2505.05470</a> </li><li>NeurIPS 2025</li></ul> <a href="../../../../files/Accounts/C037F400-EC11-4FAB-ACA5-467EE47E1BD1/Media/D466F5D2-3243-45D4-A33F-289381640F82/1_4E2D511B-9920-4274-8811-1497DDFB4EF7/Pasted%20Graphic.tiff" class="attr" data-apple-notes-zidentifier="7AFEA5E3-FBC5-427D-85CA-402799716A1D"></a> 作者将GRPO引入到Flow matching模型当中,其中方法主要由两部分组成:1、将原本流模型的确定性ODE转换成SDE,从而在采样过程中引入了随机性,使得能够进行online RL;2、训练时减少去噪步数,保持推理步数,提升了训练的效率。