The (r) evolution of multimodal large language models: A survey D Caffagni, F Cocchi, L Barsellotti, N Moratelli, S Sarto, L Baraldi, ... arXiv preprint arXiv:2402.12451, 2024 | 18 | 2024 |
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs D Caffagni, F Cocchi, N Moratelli, S Sarto, M Cornia, L Baraldi, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 11 | 2024 |
Synthcap: Augmenting transformers with synthetic data for image captioning D Caffagni, M Barraco, M Cornia, L Baraldi, R Cucchiara International Conference on Image Analysis and Processing, 112-123, 2023 | 6 | 2023 |
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization N Moratelli, D Caffagni, M Cornia, L Baraldi, R Cucchiara arXiv preprint arXiv:2408.14547, 2024 | 2 | 2024 |