Zhou Yu (余宙)

Cited by

	All	Since 2019
Citations	3847	3565
h-index	19	18
i10-index	25	23

1000

500

250

750

2014201520162017201820192020202120222023202410 21 64 57 123 293 535 652 758 986 341

Public access

View all

19 articles

10 articles

available

not available

Based on funding mandates

Zhou Yu (余宙)

Professor, Hangzhou Dianzi University

Verified email at hdu.edu.cn

multimodal learning vision and language cross-media analysis visual question answering


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Deep modular co-attention networks for visual question answering Z Yu, J Yu, Y Cui, D Tao, Q Tian IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6281-6290, 2019	866	2019
Multi-modal factorized bilinear pooling with co-attention learning for visual question answering Z Yu, J Yu, J Fan, D Tao IEEE International Conference on Computer Vision (ICCV), 1821-1830, 2017	740	2017
Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering Z Yu, J Yu, C Xiang, J Fan, D Tao IEEE Transactions on Neural Networks and Learning Systems 29 (12), 5947-5959, 2018	494	2018
Multimodal transformer with multi-view visual representation for image captioning J Yu, J Li, Z Yu, Q Huang IEEE Transactions on Circuits and Systems for Video Technology 30 (12), 4467 …, 2020	371	2020
ActivityNet-QA: A dataset for understanding complex web videos via question answering Z Yu, D Xu, J Yu, T Yu, Z Zhao, Y Zhuang, D Tao Proceedings of the AAAI Conference on Artificial Intelligence, 9127-9134, 2019	225	2019
Sparse multi-modal hashing F Wu, Z Yu, Y Yang, S Tang, Y Zhang, Y Zhuang IEEE Transactions on Multimedia 16 (2), 427 - 439, 2014	147	2014
Discriminative coupled dictionary hashing for fast cross-media retrieval Z Yu, F Wu, Y Yang, Q Tian, J Luo, Y Zhuang Proceedings of the 37th international ACM SIGIR conference on Research …, 2014	131	2014
Rethinking diversified and discriminative proposal generation for visual grounding Z Yu, J Yu, C Xiang, Z Zhao, Q Tian, D Tao International Joint Conference on Artificial Intelligence (IJCAI), 1114-1120, 2018	121	2018
Prompting large language models with answer heuristics for knowledge-based visual question answering Z Shao, Z Yu, M Wang, J Yu IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 14974-14983, 2023	101	2023
Deep multimodal neural architecture search Z Yu, Y Cui, J Yu, M Wang, D Tao, Q Tian Proceedings of the 28th ACM International Conference on Multimedia, 3743-3752, 2020	81	2020
SPRNet: Single pixel reconstruction for one-stage instance segmentation J Yu, J Yao, J Zhang, Z Yu, D Tao IEEE Transactions on Cybernetics 51 (4), 1731-1742, 2021	78	2021
Open-ended long-form video question answering via adaptive hierarchical reinforced networks Z Zhao, Z Zhang, S Xiao, Z Yu, J Yu, D Cai, F Wu, Y Zhuang International Joint Conference on Artificial Intelligence (IJCAI), 3683-3689, 2018	63	2018
ROSITA: Enhancing vision-and-language semantic alignments via cross-and intra-modal knowledge integration Y Cui, Z Yu, C Wang, Z Zhao, J Zhang, M Wang, J Yu Proceedings of the 29th ACM International Conference on Multimedia, 797-806, 2021	53	2021
MARN: Multi-level attentional reconstruction networks for weakly supervised video temporal grounding Y Song, J Wang, L Ma, J Yu, J Liang, L Yuan, Z Yu Neurocomputing 554, 126625, 2023	49*	2023
Long-term video question answering via multimodal hierarchical memory attentive networks T Yu, J Yu, Z Yu, Q Huang, Q Tian IEEE Transactions on Circuits and Systems for Video Technology 31 (3), 931-944, 2020	46	2020
Compositional attention networks with two-stream fusion for video question answering T Yu, J Yu, Z Yu, D Tao IEEE Transactions on Image Processing 29, 1204-1218, 2019	41	2019
Multimodal unified attention networks for vision-and-language interactions Z Yu, Y Cui, J Yu, D Tao, Q Tian arXiv preprint arXiv:1908.04107, 2019	40	2019
Cross-media hashing with neural networks Y Zhuang, Z Yu, W Wang, F Wu, S Tang, J Shao Proceedings of the 22nd ACM international conference on Multimedia, 901-904, 2014	36	2014
Comprehensive distance-preserving autoencoders for cross-modal retrieval Y Zhan, J Yu, Z Yu, R Zhang, D Tao, Q Tian Proceedings of the 26th ACM international conference on Multimedia, 1137-1145, 2018	33	2018
Accelerated masked transformer for dense video captioning Z Yu, N Han Neurocomputing 445, 72-80, 2021	18	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by