<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>AnyDoor: Zero-shot Object-level Image Customization</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2307.09481">https://arxiv.org/abs/2307.09481</a> </li><li>CVPR 2024</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1730969173/B8877895-68A7-46EE-BF5D-B69709F06975.png" style="background-color:initial;max-width:min(100%,3150px);max-height:min(1508px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1730969173/B8877895-68A7-46EE-BF5D-B69709F06975.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="3150" height="1508"> 任意门~ 文章提出了AnyDoor,可以将任意物体置于任意背景的指定位置。该模型首先将指定物体经过ID Extractor提取特征,该模块使用DINOv2,一个自监督的图像编码器,得到特征的token后替换原本文生图模型的文本token,进而引导图像生成。为了实现背景一致并引入细粒度特征,模型还将目标的高通特征图与背景和mask进行拼接,随后输入到类似ControlNet的模型中进行细节提取然后融合进UNet。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1730971439/41DF193D-4D38-4851-BAF5-73A75229B2A5.png" style="background-color:initial;max-width:min(100%,3116px);max-height:min(2224px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1730971439/41DF193D-4D38-4851-BAF5-73A75229B2A5.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="3116" height="2224"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:视频数据</li><li>指标:CLIP-Score;DINO-Score</li><li>硬件:未提及</li><li><a href="https://github.com/ali-vilab/AnyDoor">https://github.com/ali-vilab/AnyDoor</a></li></ul>