Follow
Zhou Yu (余宙)
Title
Cited by
Cited by
Year
Deep modular co-attention networks for visual question answering
Z Yu, J Yu, Y Cui, D Tao, Q Tian
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6281-6290, 2019
8662019
Multi-modal factorized bilinear pooling with co-attention learning for visual question answering
Z Yu, J Yu, J Fan, D Tao
IEEE International Conference on Computer Vision (ICCV), 1821-1830, 2017
7402017
Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering
Z Yu, J Yu, C Xiang, J Fan, D Tao
IEEE Transactions on Neural Networks and Learning Systems 29 (12), 5947-5959, 2018
4942018
Multimodal transformer with multi-view visual representation for image captioning
J Yu, J Li, Z Yu, Q Huang
IEEE Transactions on Circuits and Systems for Video Technology 30 (12), 4467 …, 2020
3712020
ActivityNet-QA: A dataset for understanding complex web videos via question answering
Z Yu, D Xu, J Yu, T Yu, Z Zhao, Y Zhuang, D Tao
Proceedings of the AAAI Conference on Artificial Intelligence, 9127-9134, 2019
2252019
Sparse multi-modal hashing
F Wu, Z Yu, Y Yang, S Tang, Y Zhang, Y Zhuang
IEEE Transactions on Multimedia 16 (2), 427 - 439, 2014
1472014
Discriminative coupled dictionary hashing for fast cross-media retrieval
Z Yu, F Wu, Y Yang, Q Tian, J Luo, Y Zhuang
Proceedings of the 37th international ACM SIGIR conference on Research …, 2014
1312014
Rethinking diversified and discriminative proposal generation for visual grounding
Z Yu, J Yu, C Xiang, Z Zhao, Q Tian, D Tao
International Joint Conference on Artificial Intelligence (IJCAI), 1114-1120, 2018
1212018
Prompting large language models with answer heuristics for knowledge-based visual question answering
Z Shao, Z Yu, M Wang, J Yu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 14974-14983, 2023
1012023
Deep multimodal neural architecture search
Z Yu, Y Cui, J Yu, M Wang, D Tao, Q Tian
Proceedings of the 28th ACM International Conference on Multimedia, 3743-3752, 2020
812020
SPRNet: Single pixel reconstruction for one-stage instance segmentation
J Yu, J Yao, J Zhang, Z Yu, D Tao
IEEE Transactions on Cybernetics 51 (4), 1731-1742, 2021
782021
Open-ended long-form video question answering via adaptive hierarchical reinforced networks
Z Zhao, Z Zhang, S Xiao, Z Yu, J Yu, D Cai, F Wu, Y Zhuang
International Joint Conference on Artificial Intelligence (IJCAI), 3683-3689, 2018
632018
ROSITA: Enhancing vision-and-language semantic alignments via cross-and intra-modal knowledge integration
Y Cui, Z Yu, C Wang, Z Zhao, J Zhang, M Wang, J Yu
Proceedings of the 29th ACM International Conference on Multimedia, 797-806, 2021
532021
MARN: Multi-level attentional reconstruction networks for weakly supervised video temporal grounding
Y Song, J Wang, L Ma, J Yu, J Liang, L Yuan, Z Yu
Neurocomputing 554, 126625, 2023
49*2023
Long-term video question answering via multimodal hierarchical memory attentive networks
T Yu, J Yu, Z Yu, Q Huang, Q Tian
IEEE Transactions on Circuits and Systems for Video Technology 31 (3), 931-944, 2020
462020
Compositional attention networks with two-stream fusion for video question answering
T Yu, J Yu, Z Yu, D Tao
IEEE Transactions on Image Processing 29, 1204-1218, 2019
412019
Multimodal unified attention networks for vision-and-language interactions
Z Yu, Y Cui, J Yu, D Tao, Q Tian
arXiv preprint arXiv:1908.04107, 2019
402019
Cross-media hashing with neural networks
Y Zhuang, Z Yu, W Wang, F Wu, S Tang, J Shao
Proceedings of the 22nd ACM international conference on Multimedia, 901-904, 2014
362014
Comprehensive distance-preserving autoencoders for cross-modal retrieval
Y Zhan, J Yu, Z Yu, R Zhang, D Tao, Q Tian
Proceedings of the 26th ACM international conference on Multimedia, 1137-1145, 2018
332018
Accelerated masked transformer for dense video captioning
Z Yu, N Han
Neurocomputing 445, 72-80, 2021
182021
The system can't perform the operation now. Try again later.
Articles 1–20