<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>PEEKABOO: Interactive Video Generation via Masked-Diffusion</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2312.07509">https://arxiv.org/abs/2312.07509</a> </li><li>CVPR 2023</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1752471098/A06139E6-F961-4777-A108-E943117F75C3.png" style="background-color:initial;max-width:min(100%,1928px);max-height:min(720px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1752471098/A06139E6-F961-4777-A108-E943117F75C3.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1928" height="720"> 文章是第一篇training-free的基于bbox轨迹引导的视频生成方法。具体来说,文章就是利用bbox轨迹构造各种attention的mask,其中包含U-Net中的空间域和时间域的self-attention mask以及文本cross-attention mask,使得box区域只关注本身,且只有目标token引导box区域。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1752471229/307CEC04-EA2E-4E7C-963E-F85A21529FC4.png" style="background-color:initial;max-width:min(100%,1902px);max-height:min(870px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1752471229/307CEC04-EA2E-4E7C-963E-F85A21529FC4.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1902" height="870"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:自己构建了一个benchmark</li><li>指标:mIoU; CD(框中心距离);AP50;Cov;FVD</li><li>硬件:未提及</li><li>开源:<a href="https://yash-jain.com/projects/Peekaboo/">https://yash-jain.com/projects/Peekaboo/</a> </li></ul>