Video-LLaVA: Learning United Visual Representation by Alignment Before Projection B Lin, B Zhu, Y Ye, M Ning, P Jin, L Yuan arXiv preprint arXiv:2311.10122, 2023 | 337 | 2023 |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models B Lin, Z Tang, Y Ye, J Cui, B Zhu, P Jin, J Zhang, M Ning, L Yuan arXiv preprint arXiv:2401.15947, 2024 | 131 | 2024 |
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang, Y Pang, W Jiang, J Zhang, ... arXiv preprint arXiv:2310.01852, 2023 | 125 | 2023 |
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models M Ning, B Zhu, Y Xie, B Lin, J Cui, L Yuan, D Chen, L Yuan arXiv preprint arXiv:2311.16103, 2023 | 31 | 2023 |
TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training Y Liu, G Zhu, B Zhu, Q Song, G Ge, H Chen, GH Qiao, R Peng, L Wu, ... Advances in Neural Information Processing Systems 35, 16705-16717, 2022 | 21 | 2022 |
Od-vae: An omni-dimensional video compressor for improving latent video diffusion model L Chen, Z Li, B Lin, B Zhu, Q Wang, S Yuan, X Zhou, X Cheng, L Yuan arXiv preprint arXiv:2409.01199, 2024 | 6 | 2024 |
LLMBind: A Unified Modality-Task Integration Framework B Zhu, P Jin, M Ning, B Lin, J Huang, Q Song, M Pan, L Yuan arXiv preprint arXiv:2402.14891, 2024 | 6 | 2024 |
Open-Sora Plan: Open-Source Large Video Generation Model B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He, Y Ye, S Yuan, L Chen, ... arXiv preprint arXiv:2412.00131, 2024 | | 2024 |