<ul class="dashed" data-apple-notes-indent-amount="0"><li><span style="font-family: '.PingFangUITextSC-Regular'">文章标题:</span>Chameleon: Mixed-Modal Early-Fusion Foundation Models</li><li><span style="font-family: '.PingFangSC-Regular'">文章地址:</span><a href="https://arxiv.org/abs/2405.09818">https://arxiv.org/abs/2405.09818</a> </li><li>技术报告</li></ul> <img src="https://res.cloudinary.com/montaigne-io/image/upload/v1746684502/2718DF1D-EA03-4300-B087-A0013EB7815B.png" style="background-color:initial;max-width:min(100%,1290px);max-height:min(850px);;background-image:url(https://res.cloudinary.com/montaigne-io/image/upload/v1746684502/2718DF1D-EA03-4300-B087-A0013EB7815B.png);height:auto;width:100%;object-fit:cover;background-size:cover;display:block;" width="1290" height="850"> 该模型利用Image Tokenizer将图像离散化为token,与文本共同输入到模型中进行统一理解与生成。