Video-llava: Learning united visual representation by alignment before projection B Lin, B Zhu, Y Ye, M Ning, P Jin, L Yuan arXiv preprint arXiv:2311.10122, 2023 | 49 | 2023 |
Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang, Y Pang, W Jiang, J Zhang, ... arXiv preprint arXiv:2310.01852, 2023 | 22 | 2023 |
Moe-llava: Mixture of experts for large vision-language models B Lin, Z Tang, Y Ye, J Cui, B Zhu, P Jin, J Zhang, M Ning, L Yuan arXiv preprint arXiv:2401.15947, 2024 | 14 | 2024 |
BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis Z Qiu, L Yuan, CA Lian, B Lin, J Chen, R Mu, X Qiao, L Zhang, Z Xu, L Fan, ... Nature Communications 15 (1), 2179, 2024 | 1 | 2024 |
UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark Z Zhou, Q Wang, B Lin, Y Su, R Chen, X Tao, A Zheng, L Yuan, P Wan, ... arXiv preprint arXiv:2404.09619, 2024 | | 2024 |
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators S Yuan, J Huang, Y Shi, Y Xu, R Zhu, B Lin, X Cheng, L Yuan, J Luo arXiv preprint arXiv:2404.05014, 2024 | | 2024 |
LLMBind: A Unified Modality-Task Integration Framework B Zhu, P Jin, M Ning, B Lin, J Huang, Q Song, M Pan, L Yuan arXiv preprint arXiv:2402.14891, 2024 | | 2024 |
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models M Ning, B Zhu, Y Xie, B Lin, J Cui, L Yuan, D Chen, L Yuan arXiv preprint arXiv:2311.16103, 2023 | | 2023 |