<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>Generative Multimodal Models are In-Context Learners</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2312.13286">https://arxiv.org/abs/2312.13286</a> </li><li>CVPR 2024</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1746608499/85A8D3BD-30C5-4691-BC50-57D1593D6C4C.png" style="background-color:initial;max-width:min(100%,2150px);max-height:min(956px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1746608499/85A8D3BD-30C5-4691-BC50-57D1593D6C4C.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="2150" height="956"> 相对于上一个版本Emu,这个版本的改进主要有:模型结构的改进(去掉了Causal Transformer);训练数据(增加了一些grounded图像文本对数据);针对不同任务,使用不同数据二次训练了不同版本的模型(Emu2-Chat关注对话,Emu2-Gen关注可控图像生成) <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1746609110/D2D99509-BC4C-46CE-AB44-18F21170A173.png" style="background-color:initial;max-width:min(100%,1656px);max-height:min(2082px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1746609110/D2D99509-BC4C-46CE-AB44-18F21170A173.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1656" height="2082"> <ul class="dashed" data-apple-notes-indent-amount="0"><li>数据:同Emu外,增加了另外的grounded数据以及针对特定任务的数据(Chat Gen)</li><li>指标:不同模型有不同的指标</li><li>硬件:未提及</li><li>开源:<a href="https://github.com/baaivision/Emu">https://github.com/baaivision/Emu</a> </li></ul>