<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>Improving Video Generation with Human Feedback</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2501.13918">https://arxiv.org/abs/2501.13918</a> </li><li>NeurIPS 2025</li></ul> <img src="https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_ec1bf282-f561-4554-be9c-7be88cccb23f/public" style="background-color:initial;max-width:min(100%,3356px);max-height:min(2502px);;background-image:url(https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_ec1bf282-f561-4554-be9c-7be88cccb23f/public);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="3356" height="2502"> 文章针对视频生成模型的RLHF进行创新。首先提出了一个大规模的偏好数据集,解决了之前的数据集分辨率低等缺点;提出了多维度的奖励模型,训练VLM来对视频进行偏好训练;随后利用训练好的奖励模型,设计了流模型的强化学习算法,训练时策略(Flow-DPO, FlowRWR),以及推理时策略(Flow-NRG)。 Flow-DPO伪代码: <a href="../../../../files/Accounts/C037F400-EC11-4FAB-ACA5-467EE47E1BD1/Media/4BF48050-2FFB-4633-B946-49B3DC718BB0/1_BCF60CA6-53A5-46BF-916F-EAF0EFF55991/Pasted%20Graphic%202.tiff" class="attr" data-apple-notes-zidentifier="5D28769A-1555-4474-B8E1-AFDA63F4529C"></a>