- 文章标题:VTimeLLM: Empower LLM to Grasp Video Moments
- 文章地址:https://arxiv.org/abs/2311.18445
- CVPR 2024


- 数据:训练(LCS-558k, InternVid, ActivityNet Captions, DiDeMo, VideoInstruct100k);测试(ActivityNet Captions, Charades-STA)
- 指标:视频时间定位能力(IoU—时间边界间);视频详尽描述能力(SODA_c—传统指标,CIDEr+METEOR—配对视频事件的caption的匹配程度?)
- 硬件:1 4090/bs128
- 开源:https://github.com/huangb23/VTimeLLM