Follow
Bin Zhu
Bin Zhu
Verified email at stu.pku.edu.cn
Title
Cited by
Cited by
Year
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
B Lin, B Zhu, Y Ye, M Ning, P Jin, L Yuan
arXiv preprint arXiv:2311.10122, 2023
3372023
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
B Lin, Z Tang, Y Ye, J Cui, B Zhu, P Jin, J Zhang, M Ning, L Yuan
arXiv preprint arXiv:2401.15947, 2024
1312024
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang, Y Pang, W Jiang, J Zhang, ...
arXiv preprint arXiv:2310.01852, 2023
1252023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
M Ning, B Zhu, Y Xie, B Lin, J Cui, L Yuan, D Chen, L Yuan
arXiv preprint arXiv:2311.16103, 2023
312023
TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training
Y Liu, G Zhu, B Zhu, Q Song, G Ge, H Chen, GH Qiao, R Peng, L Wu, ...
Advances in Neural Information Processing Systems 35, 16705-16717, 2022
212022
Od-vae: An omni-dimensional video compressor for improving latent video diffusion model
L Chen, Z Li, B Lin, B Zhu, Q Wang, S Yuan, X Zhou, X Cheng, L Yuan
arXiv preprint arXiv:2409.01199, 2024
62024
LLMBind: A Unified Modality-Task Integration Framework
B Zhu, P Jin, M Ning, B Lin, J Huang, Q Song, M Pan, L Yuan
arXiv preprint arXiv:2402.14891, 2024
62024
Open-Sora Plan: Open-Source Large Video Generation Model
B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He, Y Ye, S Yuan, L Chen, ...
arXiv preprint arXiv:2412.00131, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–8