<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2503.14428">https://arxiv.org/abs/2503.14428</a> </li><li>arxiv</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1748919937/B590DBF6-3E30-4313-B007-86B78E7ED967.png" style="background-color:initial;max-width:min(100%,1882px);max-height:min(1230px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1748919937/B590DBF6-3E30-4313-B007-86B78E7ED967.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1882" height="1230"> 文章针对文生视频模型中的组合视频生成(多主体)存在的问题(属性混乱、位置错误等)进行了training-free的优化,具体来说,方法分为两个阶段:1、在条件阶段,对主体token的embedding进行优化(原prompt经过编码器后,主体token的embedding会出现混淆的情况)2、在去噪阶段,对指定token的attention进行mask优化,使其关注指定区域。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1748920159/8AA0313F-6ABF-40E6-B896-E08898CB6AB0.png" style="background-color:initial;max-width:min(100%,1874px);max-height:min(1096px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1748920159/8AA0313F-6ABF-40E6-B896-E08898CB6AB0.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1874" height="1096"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:无需训练数据</li><li>指标:T2V-CompBench; VBench</li><li>硬件:1 A100</li><li>开源:<a href="https://hong-yu-zhang.github.io/MagicComp-Page/">https://hong-yu-zhang.github.io/MagicComp-Page/</a> </li></ul>