<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangSC-Regular'">文章标题:</span>Face0: Instantaneously Conditioning a Text-to-Image Model on a Face</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2306.06638">https://arxiv.org/abs/2306.06638</a> </li><li>SIGGRAPH Asia 2023</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1724930471/852DE296-422D-49FF-8AD4-57B5B1BB83E0.png" style="background-color:initial;max-width:min(100%,2472px);max-height:min(1250px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1724930471/852DE296-422D-49FF-8AD4-57B5B1BB83E0.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2472" height="1250"> <span style="font-family: '.PingFangSC-Regular'"> 文章提出了</span>Face0,一种全新的将人脸作为条件的即时文生图方法,在采样时无需任何优化过程,例如微调或反转。 <span style="font-family: '.PingFangSC-Regular'"> 作者使用数据中的人脸</span>embedding对数据集进行增强,在模型训练后,模型推理时间跟原本的模型一样,并且能够根据用户提供的人脸图像进行输出。<span style="font-family: '.PingFangSC-Regular'">具体来说,模型先对图片的人脸进行识别,然后经过特征提取网络提取人脸</span>embedding,随后将该embedding映射到CLIP空间中,嵌入到文本prompt的最后三个token中,输入模型进行生成。<span style="font-family: '.PingFangSC-Regular'">模型思路非常简单,主要是运用了人脸特征提取特征。</span> <span style="font-family: '.PingFangSC-Regular'"> 文章还提到了局限性,首先是人脸的特征可能包含了固定的姿势和表达情感,如何解耦该部分内容需要解决,其次是该方法不支持多个人脸。</span> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1724930692/3C052925-90A5-4B8C-A874-59E0EA446502.png" style="background-color:initial;max-width:min(100%,2486px);max-height:min(1476px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1724930692/3C052925-90A5-4B8C-A874-59E0EA446502.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2486" height="1476"> <ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangSC-Regular'">数据:</span>LAION(筛选美学评分大于5.5,包含人脸且超过20像素)</li><li><span style="font-family: '.PingFangSC-Regular'">指标:文本对齐度(</span>CLIP),人脸对齐度(CLIP)</li><li><span style="font-family: '.PingFangSC-Regular'">硬件:</span>64 TPU-v4s/bs256</li><li><span style="font-family: '.PingFangSC-Regular'">开源:未开源</span></li></ul>