<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>CapHuman: Capture Your Moments in Parallel Universes</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2402.00627">https://arxiv.org/abs/2402.00627</a> </li><li>CVPR 2024</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1729567448/1A8A0914-5673-402E-B74D-FDBF2B98BA51.png" style="background-color:initial;max-width:min(100%,2386px);max-height:min(1142px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1729567448/1A8A0914-5673-402E-B74D-FDBF2B98BA51.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2386" height="1142"> <span style="font-family: '.PingFangUITextSC-Regular'">对于特征提取,该方法分别使用人脸识别模型和</span>CLIP<span style="font-family: '.PingFangUITextSC-Regular'">图像编码器提取粗粒度特征和细粒度特征,随后通过映射到同一维度后拼接得到人脸特征。同时,使用</span>3D<span style="font-family: '.PingFangUITextSC-Regular'">人脸重建模型得到人脸的</span>3D<span style="font-family: '.PingFangUITextSC-Regular'">模型,可以对其进行姿势、位置、情感等编辑然后得到三种像素级别的条件(</span>Surface Normal, Albedo, and Lambertian rendering)作为类似ControlNet中landmark条件,随后输入到类似ControlNet的CapFace模块,利用人脸特征进行注意力融合注入到预训练的SD中。 该方法同InstantID,使用了类似ControlNet的模块进行特征融合,这种融合方式感觉更加能重建人脸的细粒度的特征。 <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1729567856/8F8DD2EC-36F3-4846-BA7D-69478704C41C.png" style="background-color:initial;max-width:min(100%,2342px);max-height:min(1452px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1729567856/8F8DD2EC-36F3-4846-BA7D-69478704C41C.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2342" height="1452"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:CelebA+BLIP进行Caption</li><li>指标:ID保留度(人脸识别模型);文本图像对齐度(CLIP);头部控制准确度</li><li>硬件:未提及</li><li>开源:<a href="https://github.com/VamosC/CapHuman">https://github.com/VamosC/CapHuman</a> </li></ul>