<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2211.01324">https://arxiv.org/abs/2211.01324</a> </li><li>arxiv</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1734936044/E38289F0-87A3-4DDE-B4A6-3A9BC767124F.png" style="background-color:initial;max-width:min(100%,1576px);max-height:min(1814px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1734936044/E38289F0-87A3-4DDE-B4A6-3A9BC767124F.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1576" height="1814"> 文章指出在文生图的扩散过程中,模型在不同的阶段有着不一样的行为:在采样初期,图像生成更依赖于文本条件;然后到后期,文本条件几乎被忽视,模型更关注生成高质量的视觉特征。 因此作者就想到在不同的采样阶段使用不同的专家模型来进行去噪,从而使模型在不增加额外推理计算量的同时增加生成图像的质量。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1734936248/71577924-389C-4C78-B02E-A33A04E97DE8.png" style="background-color:initial;max-width:min(100%,2346px);max-height:min(2206px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1734936248/71577924-389C-4C78-B02E-A33A04E97DE8.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2346" height="2206">