<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>VICO : PLUG-AND-PLAY VISUAL CONDITION FOR PERSONALIZED TEXT-TO-IMAGE GENERATION</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2306.00971">https://arxiv.org/abs/2306.00971</a> </li><li>arxiv</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1727964803/61420992-A555-4B69-8A99-BA6C972E29E9.png" style="background-color:initial;max-width:min(100%,1932px);max-height:min(1020px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1727964803/61420992-A555-4B69-8A99-BA6C972E29E9.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1932" height="1020"> 有一个地方存在疑问。那个CI是怎么来的? 文章提出了ViCo,一个全新的轻量化的即插即用的方法,用于无缝地将视觉条件加入到定制化文生图中。其特点跟lora很像(感觉就跟lora一样),不需要微调原来的参数,提供了高灵活和高扩展性。具体来说,ViCo使用了一个图像注意力模块,将图像的语义信息融合进了扩散过程,并且使用了基于注意力的掩码(不需额外计算)。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1727965208/66E496A1-BF78-42B2-A8E2-B8FC2B0B652B.png" style="background-color:initial;max-width:min(100%,2528px);max-height:min(1218px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1727965208/66E496A1-BF78-42B2-A8E2-B8FC2B0B652B.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2528" height="1218"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:参考图像训练</li><li>指标:DINO;CLIP;训练/推理时间</li><li>硬件:未提及</li><li>开源:<a href="https://github.com/haoosz/ViCo">https://github.com/haoosz/ViCo</a> </li></ul>