Publications

Preprint

alt text 

Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Project Page, Awesome List, GitHub
Yake Wei, Di Hu^, Yapeng Tian, Xuelong Li (^Corresponding Author)

alt text 

Towards Long Form Audio-visual Video Understanding
Project Page Wenxuan Hou*,†, Guangyao Li*, Yapeng Tian, Di Hu^

alt text 

Not All Knowledge Is Created Equal
Ziyun Li, Xinshao Wang, Haojin Yang, Di Hu, Neil M Robertson, David A Clifton, Christoph Meinel

alt text 

Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement
Xingjian Li, Di Hu, Xuhong Li, Haoyi Xiong, Zhi Ye, Zhipeng Wang, Chengzhong Xu, Dejing Dou

alt text 

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Code, Dataset
Di Hu^, Lichao Mou*, Qingzhong Wang*, Junyu Gao, Yuansheng Hua, Dejing Dou, Xiao Xiang Zhu

alt text 

Curriculum Audiovisual Learning
Di Hu, Zheng Wang, Haoyi Xiong, Dong Wang, Feiping Nie, Dejing Dou

Conference Papers

alt text 

Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Guangyao Li, Wenxuan Hou, Di Hu^
ACM MM 2023

alt text 

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu
ACM MM 2023. Oral Presentation

alt text 

Towards Inadequately Pre-trained Models in Transfer Learning
Andong Deng, Xingjian Li, Di Hu^, Tianyang Wang, Haoyi Xiong, Chengzhong Xu
ICCV 2023

alt text 

Multi-Scale Attention for Audio Question Answering
Guangyao Li, Yixin Xu, and Di Hu^
Interspeech 2023. Oral Presentation

alt text 

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Wenke Xia, Xingjian Li, Andong Deng, Haoyi Xiong, Dejing Dou, and Di Hu^
ICME 2023

alt text 

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Ruize Xu, Ruoxuan Feng, Shi-xiong Zhang, and Di Hu^
ICASSP 2023

alt text 

Exploiting Visual Context Semantics for Sound Source Localization
Xinchi Zhou, Dongzhan Zhou, Di Hu, Hang Zhou, Wanli Ouyang
WACV 2022

alt text 

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance
Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Di Hu
WACV 2022

alt text 

Balanced Multimodal Learning via On-the-fly Gradient Modulation
Code
Xiaokang Peng*, Yake Wei*, Andong Deng, Dong Wang, Di Hu^
CVPR 2022. Oral Presentation

alt text 

Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Project Page, Code
Guangyao Li*, Yake Wei*, Yapeng Tian*, Chenliang Xu, Ji-Rong Wen, Di Hu^
CVPR 2022. Oral Presentation

alt text 

Visual Sound Localization in-the-Wild by Cross-Modal Interference Erasing
Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
Code
AAAI 2022

alt text 

SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation
Dongzhan Zhou, Xinchi Zhou, Di Hu^, Hang Zhou, Lei Bai, Ziwei Liu, Wanli Ouyang
AAAI 2022

alt text 

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Code
Yapeng Tian, Di Hu^, Chenliang Xu
CVPR 2021

alt text 

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
Code
Zechen Bai, Zhigang Wang, Jian Wang, Di Hu^, Errui Ding^
CVPR 2021. Oral Presentation

alt text 

Temporal Relational Modeling with Self-Supervision for Action Segmentation
Code
Dong Wang, Di Hu^, Xingjian Li, Dejing Dou
AAAI 2021

alt text 

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Code, Dataset, Demo
Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou
NeurIPS 2020

alt text 

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Code, Dataset
Di Hu, Xuhong Li, Lichao Mou, Pu Jin, Dong Chen, Liping Jing, Xiaoxiang Zhu, Dejing Dou
ECCV 2020

alt text 

Multiple Sound Sources Localization from Coarse to Fine
Code
Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
ECCV 2020

alt text 

Listen to the Image
Supplemental Material, Code, Project page
Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang
CVPR 2019

alt text 

Deep Multimodal Clustering for Unsupervised Audiovisual Learning
Supplemental Material, Demo, Code
Di Hu, Feiping Nie, Xuelong Li
CVPR 2019

alt text 

Dense Multimodal Fusion for Hierarchically Joint Representation
Di Hu, Chengze Wang, Feiping Nie, Xuelong Li
ICASSP 2019. Lecture Presentation

alt text 

Large Graph Hashing with Spectral Rotation
Code
Xuelong Li, Di Hu, Feiping Nie
AAAI 2017

alt text 

Deep Binary Reconstruction for Cross-modal Hashing
Code
Xuelong Li, Di Hu, Feiping Nie
ACM MM 2017

alt text 

Image2song: Song Retrieval via Bridging Image Content and Lyric Words
Supplemental Material, Demo, Dataset
Xuelong Li, Di Hu, Xiaoqiang Lu
ICCV 2017

alt text 

Multimodal Learning via Exploring Deep Semantic Similarity
Di Hu, Xiaoqiang Lu, Xuelong Li
ACM MM 2016

alt text 

Temporal Multimodal Learning in Audiovisual Speech Recognition
Di Hu, Xuelong Li, Xiaoqiang Lu
CVPR 2016

Journal Papers

alt text 

Geometric-Inspired Graph-based Incomplete Multi-view Clustering
Zequn Yang, Han Zhang, Yake Wei, Zheng Wang, Feiping Nie, Di Hu^
Pattern Recognition 2023

alt text 

Supervised Knowledge May Hurt Novel Class Discovery Performance
ZiYun Li, Jona Otholt, Ben Dai, Di Hu, Christoph Meinel, Haojin Yang
TMLR 2023

alt text 

Self-supervised Audiovisual Representation Learning for Remote Sensing Data
Demo (Look to Hear Our Planet)
Konrad Heidler, Lichao Mou, Di Hu^, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu
International Journal of Applied Earth Observation and Geoinformation 2022

alt text 

Self-supervised Learning for Heterogeneous Audiovisual Scene Analysis
Di Hu, Zheng Wang, Feiping Nie, Rong Wang, Xuelong Li
IEEE TRANSACTIONS ON MULTIMEDIA (TMM) 2022

alt text 

Class-aware Sounding Objects Localization via Audiovisual Correspondence
Project Page, Code
Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen
Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021

alt text 

Generalising Combinatorial Discriminant Analysis through Conditioning Truncated Rayleigh Flow
Sijia Yang, Haoyi Xiong, Di Hu, Kaibo Xu, Licheng Wang, Peizhen Zhu, Zeyi Sun.
Knowledge and Information Systems (KAIS) 2021

alt text 

Deep Linear Discriminant Analysis Hashing
Supplemental Material, Code
Di Hu, Feiping Nie, Xuelong Li.
SCIENTIA SINICA Informationis 2019

alt text 

Discrete Spectral Hashing for Efficient Similarity Retrieval
Di Hu, Feiping Nie, Xuelong Li
IEEE TRANSACTIONS ON IMAGE PROCESSING (TIP) 2019

alt text 

Deep Binary Reconstruction for Cross-modal Hashing
Di Hu, Feiping Nie, Xuelong Li
IEEE TRANSACTIONS ON MULTIMEDIA (TMM) 2019

Workshop Papers

alt text 

Heterogeneous Scene Analysis via Self-supervised Audiovisual Learning
Demo, Video
Di Hu, Zheng Wang, Haoyi Xiong, Dong Wang, Feiping Nie, Dejing Dou
CVPR Sight and Sound Workshop 2020

alt text 

Does Ambient Sound Help? - Audiovisual Crowd Counting
Code, Dataset, Video
Di Hu*, Lichao Mou*, Qingzhong Wang*, Junyu Gao, Yuansheng Hua, Dejing Dou, Xiaoxiang Zhu
CVPR Sight and Sound Workshop 2020

alt text 

Co-Learn Sounding Object Visual Grounding and Visually Indicated Sound Separation in A Cycle
Video
Yapeng Tian*, Di Hu*, Chenliang Xu
CVPR Sight and Sound Workshop 2020

alt text 

A Two-Stage Framework for Multiple Sound-Source Localization
Code, Video
Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
CVPR Sight and Sound Workshop 2020