Difei Gao

Cited by

	All	Since 2019
Citations	560	557
h-index	13	13
i10-index	15	15

280

140

210

20182019202020212022202320242 3 3 28 69 280 172

Public access

View all

12 articles

1 article

available

not available

Based on funding mandates

Co-authors

Mike Z. SHOUNational U. of Singapore; Facebook AI; Columbia UniversityVerified email at columbia.edu
Kevin Qinghong LinNational University of SingaporeVerified email at u.nus.edu
Ruiping WangProfessor, Institute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Xilin ChenInstitute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Shiguang ShanProfessor of Institute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Joya ChenNational University of SingaporeVerified email at u.nus.edu
Yuxuan WangNanyang Technological University; National U. of SingaporeVerified email at ntu.edu.sg
Mengmi ZhangAssistant professor and PI of Deep NeuroCognition Lab, NTU and A*STARVerified email at ntu.edu.sg
Kenneth LiHarvard UniversityVerified email at g.harvard.edu
Luowei ZhouResearch Scientist, Google DeepmindVerified email at google.com
Lili PanAssociate Professor, University of Electronic Science and Technology of ChinaVerified email at uestc.edu.cn
Rui ChenUniversity of CambridgeVerified email at cam.ac.uk

Difei Gao

National U. of Singapore; Institute of Computing Technology, Chinese Academy of Sciences

Verified email at nus.edu.sg

Artificial Intelligence Vision and Language


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Multi-modal graph neural network for joint reasoning on vision and scene text D Gao, K Li, R Wang, S Shan, X Chen IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12746 …, 2020	115	2020
Egocentric video-language pretraining KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Neural Information Processing Systems (NeurIPS) 2 (3), 2022	100	2022
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14773 …, 2023	45	2023
Show-1: Marrying pixel and latent diffusion models for text-to-video generation DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu, D Gao, MZ Shou arXiv preprint arXiv:2309.15818, 2023	42	2023
Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn D Gao, L Ji, L Zhou, KQ Lin, J Chen, Z Fan, MZ Shou arXiv preprint arXiv:2306.08640, 2023	30	2023
UniVTG: Towards Unified Video-Language Temporal Grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou IEEE/CVF International Conference on Computer Vision (ICCV), 2023	26	2023
CRIC: A vqa dataset for compositional reasoning on vision and commonsense D Gao, R Wang, S Shan, X Chen IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022	23*	2022
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments D Gao, R Wang, Z Bai, X Chen IEEE/CVF International Conference on Computer Vision (ICCV), 1675-1685, 2021	21	2021
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant B Wong, J Chen, Y Wu, SW Lei, D Mao, D Gao, MZ Shou European Conference on Computer Vision (ECCV), 2022	19	2022
Symbolic replay: Scene graph as prompt for continual learning on vqa task SW Lei, D Gao, JZ Wu, Y Wang, W Liu, M Zhang, MZ Shou The AAAI Conference on Artificial Intelligence (AAAI), 2023	17	2023
Weijie Kong, et al KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Egocentric video-language pretraining. NeurIPS 35 (7575-7586), 26, 2022	17	2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval Y Wang, D Gao, L Yu, W Lei, M Feiszli, MZ Shou European Conference on Computer Vision (ECCV), 2022	16	2022
Cone: An efficient coarse-to-fine alignment framework for long video temporal grounding Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan Annual Meeting of the Association for Computational Linguistics (ACL), 2022	15	2022
Learning to recognize visual concepts for visual question answering with structural label space D Gao, R Wang, S Shan, X Chen IEEE Journal of Selected Topics in Signal Processing (JSTSP) 14 (3), 494-505, 2020	12	2020
Affordance grounding from demonstration video to target image J Chen, D Gao, KQ Lin, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6799-6808, 2023	11	2023
Cvpr 2023 text guided video editing competition JZ Wu, X Li, D Gao, Z Dong, J Bai, A Singh, X Xiang, Y Li, Z Huang, Y Sun, ... arXiv preprint arXiv:2310.16003, 2023	9	2023
KMIR: A benchmark for evaluating knowledge memorization, identification and reasoning abilities of language models D Gao, Y Jia, L Li, C Fu, Z Dou, H Jiang, X Zhang, L Chen, Z Cao arXiv preprint arXiv:2202.13529, 2022	7	2022
An efficient coarse-to-fine alignment framework@ ego4d natural language queries challenge 2022 Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan arXiv preprint arXiv:2211.08776, 2022	6	2022
Assistsr: Task-oriented video segment retrieval for personal AI assistant SW Lei, D Gao, Y Wang, D Mao, Z Liang, L Ran, MZ Shou Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021	6*	2021
Assistgui: Task-oriented desktop graphical user interface automation D Gao, L Ji, Z Bai, M Ouyang, P Li, D Mao, Q Wu, W Zhang, P Wang, ... arXiv preprint arXiv:2312.13108, 2023	4	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors