Zhenghao Liu (刘正皓), Associate Professor of Northeastern University, is a visiting researcher at Tsinghua University’s Natural Language Processing Laboratory (THUNLP). I obtained my Bachelor’s degree in Engineering from Northeastern University in 2016 and was directly admitted to Tsinghua University’s State Key Laboratory of Intelligent Technology and Systems as a Ph.D. student, under the supervision of Professor Sun Maosong. My main research interests include natural language processing and information retrieval, and I earned my Ph.D. degree from Tsinghua University in 2021 (thesis).

I graduated from Northeastern University with a bachelor’s degree and from the Department of Computer Science and Technology, Tsinghua University (THUNLP Lab) with a Ph.D. degree, advised by Maosong Sun (孙茂松). Now, I work with Ge Yu (于戈) and Yu Gu (谷峪). I also collaborate with Zhiyuan Liu (刘知远), Chenyan Xiong (熊辰炎), Shuo Wang (王硕), Yukun Yan (闫宇坤), Shi Yu (于是), Liner Yang (杨麟儿), and Zulong Chen (陈祖龙) from Tsinghua University, Carnegie Mellon University, Beijing Language and Culture University, and Alibaba closely. I have published more than 30 papers at the top international AI conferences with total google scholar citations 1800 (You can also use google scholar badge ). We now pay more attention to build the OpenMatch toolkit and focus on the research of LLM and data twinning.

🔥 News

  • 2024.05:  🎉🎉 We have five papers accepted by ACL 2024.
  • 2024.04:  🎉🎉 We have one paper accepted by WebConf 2024.

📖 Educations

  • Aug, 2021 - Now, Dept. of Computer Science and Technology (NEUIR Lab), Northeastern University, Shenyang, China.
  • Aug, 2016 - Jun, 2021, Dept. of Computer Science and Technology (THUNLP Lab), Tsinghua University, Beijing, China.
  • Sep, 2012 - Jul, 2016, Dept. of Computer Science and Technology, Northeastern University, Shenyang, China.

💬 Invited Talks

  • 2023.08, “Semantic Matching based Retrieval for Multimodal Documents” at The 17th China Conference on Knowledge Graph and Semantic Computing (CCKS 2023) [slides].
  • 2021.04, “From Exact Matching to Semantic Matching: Using neural models for ranking” at Hong Kong University of Science and Technology (HKUST) [slides].

🧑‍🎨 Academic Services

  • Session Chair of MLNLP 2022-2023.
  • Web Chair of CCL 2024.
  • Area Chair of ACL ARR, COLING, ICTIR.
  • Conference Review: NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, COLING, WebConf, ECAI, AAAI, EACL.
  • Journal Review: TPAMI,TKDE,FCS,IEEE Transactions on Big Data, AI Open, TOIS.

📝 Publications

    * indicates equal contribution.
    # indicates corresponding author.


  • Zhipeng Xu, Zhenghao Liu#, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong. Cleaner Pretraining Corpus Curation with Neural Web Scraping. ACL 2024. [pdf][codes].

  • Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu#, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu. MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin. ACL 2024. [pdf][codes].

  • Hanbin Wang, Zhenghao Liu#, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu . INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair. ACL 2024: Findings. [pdf][codes].

  • Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun. UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset. ACL 2024. [pdf][codes].

  • Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun. MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization. ACL 2024: Findings. [pdf][codes].

  • Zhenghao Liu, Zulong Chen*, Moufeng Zhang*, Shaoyang Duan, Hong Wen, Liangyue Li, Nan Li, Yu Gu, Ge Yu. Modeling User Viewing Flow Using Large Language Models for Article Recommendation. WebConf 2024. [pdf].

  • Shi Yu, Chenghao Fan, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu#. Fusion-in-T5: Unifying Variant Signals for Simple and Effective Document Ranking with Attention Fusion. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (COLING 2024). [pdf][codes].

  • Ruining Chong, Luming Lu, Liner Yang, Jinran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang. MCTS: A Multi-Reference Chinese Text Simplification Dataset. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (COLING 2024). [pdf][codes].

  • Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu. Toolink: Linking toolkit creation and using through chain-of-solving on open-source model. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). [pdf][codes].

  • Yumeng Song, Yu Gu, Tianyi Li, Jianzhong Qi, Zhenghao Liu, Christian S Jensen, Ge Yu. CHGNN: A Semi-Supervised Contrastive Hypergraph Learning Network. IEEE Transactions on Knowledge and Data Engineering (TKDE). [pdf][code].

  • Yuqing Lan, Zhenghao Liu#, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu. Multi-Evidence based Fact Verification via A Confidential Graph Neural Network. IEEE Transactions on Big Data (TBD). [pdf][code].


  • Zhenghao Liu, Chenyan Xiong, Yuanhuiyi Lv, Zhiyuan Liu, Ge Yu. Universal Multi-Modal Retrieval: Learning A Unified Representation Space for Vision Language Retrieval. The Eleventh International Conference on Learning Representations (ICLR 2023). [pdf][codes].

  • Zhenghao Liu*#, Sen Mei, Chenyan Xiong, Xiaohua Li, Shi Yu, Zhiyuan Liu, Yu Gu, Ge Yu. Text Matching Improves Sequential Recommendation by Reducing Popularity Biases The 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023). [pdf][codes].

  • Shi Yu, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu. OpenMatch-v2: An All-in-one Multi-Modality PLM-based Information Retrieval Toolkit. The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023). [pdf][codes].

  • Xinze Li, Zhenghao Liu#, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, Ge Yu. Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data. Findings of the Association for Computational Linguistics: ACL 2023 (ACL 2023). [pdf][codes].

  • Ruining Chong, Cunliang Kong, Liu Wu, Zhenghao Liu, Ziye Jin, Liner Yang, Yange Fan, Hanghang Fan, Erhong Yang. Leveraging Prefix Transfer for Multi-Intent Text Revision. The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [pdf].


  • Zhenghao Liu, Han Zhang, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Xiaohua Li. Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [pdf][codes].

  • Xiaomeng Hu, Shi Yu, Chenyan Xiong, Zhenghao Liu#, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022). [pdf][codes].


  • Zhenghao Liu, Xiaoyuan Yi, Maosong Sun, Liner Yang, Tat-Seng Chua. Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction. The 2021 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT 2021). [pdf][codes].

  • Zhenghao Liu*, Kaitao Zhang*, Chenyan Xiong, Zhiyuan Liu, Maosong Sun. OpenMatch: An Open Source Library for Neu-IR Research. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). [pdf][codes].

  • Shi Yu*, Zhenghao Liu*, Chenyan Xiong, Tao Feng, Zhiyuan Liu. Few-Shot Conversational Dense Retrieval. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). [pdf][codes].

  • Yizhi Li*, Zhenghao Liu*, Chenyan Xiong, Zhiyuan Liu. More Robust Dense Retrieval with Contrastive Dual Learning. The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2021). [pdf][codes].

  • Si Sun*, Zhenghao Liu*, Chenyan Xiong, Zhiyuan Liu and Jie Bao. Capturing Global Informativeness in Open Domain Keyphrase Extraction. The CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2021). [pdf][codes].

  • Si Sun, Yingzhuo Qian, Zhenghao Liu, Chenyan Xiong, Kaitao Zhang, Jie Bao, Zhiyuan Liu, Paul Bennett. Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision. The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). [pdf][codes].

  • Huiyuan Xie, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu and Ann Copestake . TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling. Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). [pdf][codes]


  • Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu. Fine-grained Fact Verification with Kernel Graph Attention Network. ACL 2020. [pdf][codes].

  • Zhenghao Liu, Chenyan Xiong, Zhuyun Dai, Si Sun, Maosong Sun, Zhiyuan Liu. Adapting Open Domain Fact Extraction and Verification to COVID-FACT through In-Domain Language Modeling. EMNLP 2020: Findings. [pdf][codes].

  • Houyu Zhang*, Zhenghao Liu*, Chenyan Xiong, Zhiyuan Liu. Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. ACL 2020. [pdf][codes].

  • Chenyan Xiong*, Zhenghao Liu*, Si Sun*, Zhuyun Dai*, Kaitao Zhang*, Shi Yu*, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett. CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search. [pdf][codes].

  • Xiaoyuan Yi, Zhenghao Liu, Wenhao Li, Maosong Sun. 2020. Text Style Transfer via Learning Style Instance Supported Latent Space. IJCAI 2019. [pdf].

  • Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu. Selective Weak Supervision for Neural Information Retrieval. WebConf 2020. [pdf][codes].

  • Deming Ye, Yankai Lin, Jiaju Du, Zhenghao Liu, Peng Li, Maosong Sun, Zhiyuan Liu. Coreferential Reasoning Learning for Language Representation. EMNLP 2020. [pdf][codes].


  • Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu. Explore Entity Embedding Effectiveness in Entity Retrieval. CCL 2019.[pdf][codes].

  • Yifan Qiao, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu. Understanding the Behaviors of BERT in Ranking. [pdf].

  • Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun. DocRED: A Large-Scale Document-Level Relation Extraction Dataset. ACL 2019.[pdf][codes].


  • Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu. Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval. ACL 2018.[pdf][codes].


  • Liner Yang, Maosong Sun, Jiacheng Zhang, Zhenghao Liu, Huanbo Luan, Yang Liu. Neural Parse Combination. Journal of Computer Science and Technology, 2017.[pdf].