<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>REAL-TIME VIDEO GENERATION WITH PYRAMID ATTENTION BROADCAST</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2408.12588">https://arxiv.org/abs/2408.12588</a> </li><li>ICLR 2025</li></ul> <img src="https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_024c263f-e263-4e28-87b0-4d37bf5ed864/public" style="background-color:initial;max-width:min(100%,1544px);max-height:min(682px);;background-image:url(https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_024c263f-e263-4e28-87b0-4d37bf5ed864/public);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1544" height="682"> 作者发现在推理过程中,attention的输出差(相邻步数)呈现一个U型的曲线,如下图所示,具有一个稳定的过程,因此可以复用这个过程中的attention输出来加速模型的推理,其复用的步数由输出差的大小来决定,呈‘金字塔’状,即差别越大,复用的步数越少;差别越小,复用的步数越多。值得注意的是,PAB复用的步数是固定的,这也启发了后续工作在这方面进行优化。 <img src="https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_673f6986-ae95-40ea-a248-a77ae9e96503/public" style="background-color:initial;max-width:min(100%,862px);max-height:min(728px);;background-image:url(https://imagedelivery.net/phxEHgsq3j8gSnfNAJVJSQ/node3_673f6986-ae95-40ea-a248-a77ae9e96503/public);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="862" height="728"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:training-free</li><li>指标:计算量;延迟;视频质量</li><li>硬件:未提及</li><li>开源:<a href="https://github.com/NUS-HPC-AI-Lab/VideoSys">https://github.com/NUS-HPC-AI-Lab/VideoSys</a> </li></ul>