Jinhui Ye`s Homepage

Here is Jinhui (叶劲辉), an incoming PhD student at HKUST and supervised by Prof. Jiaya Jia. My research interests lie at the intersection of action perception, manipulation, and language-based human-robot interaction.

Previously, I obtained my degree from South China University of Technology and pursued an MPhil at The Hong Kong University of Science and Technology (Guangzhou) under the guidance of Prof. Hui Xiong and Junwei Liang.
I have also had the opportunity to intern at several prestigious institutions:

At Tencent AI Lab, I worked on sign language translation in collaboration with Xing Wang and Wenxiang Jiao.
At Stanford Vision and Learning Lab, I collaborated with Prof Manling Li, Prof. Jiajun Wu and Prof. Li Fei-Fei.
At CMU LIT, I worked with Prof. Yonatan Bisk on Embodied-RAG.

Recently, I have been interning at Shanghai AI Lab with Dr. Yilun Chen, where our efforts are focused on aligning cognition and action within a Vision-Language-Action (VLA) framework.
For my PhD, I plan to further explore this topic by developing a unified VLA model—a concept encapsulated in the Chinese phrase “知行合一” (the unity of knowledge and action, where a robot “does what it knows and knows what it is doing”). I am deeply passionate about this area and believe it holds great promise for both advancing research and enriching my personal philosophy. I welcome discussions and collaboration on these topics, so please feel free to reach out via email if you’re interested.

News

• 0928: Excited to announce that our two projects clinched 1st and 2nd places in the Human-Machine Interaction track of the Pazhou Algorithm Contest! We’ve been awarded a total of 120,000 RMB and look forward to a big celebratory dinner. The relative works will be documented in upcoming papers. link

• Before 08/30/2023: […]

Publications

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
Main code contributor, VLMs-To-VLAs. Open Source Project 2025. [project page]
InternVLA-M1: A Spatially Grounded Foundation Model for Generalist Robot Policy
Main code contributor. Technical Report 2025. [project page]
FACE: A general Framework for Mapping Collaborative Filtering Embeddings into LLM Tokens
Chao Wang, Yixin Song, Jinhui Ye, Chuan Qin, Dazhong Shen, Lingfeng Liu, Xiang Wang, Yanyong Zhang
NeurIPS 2025. [paper]
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo, Ziyang Chen, Shaoguang Wang, Jianxiang He, Yijie Xu, Jinhui Ye, Ying Sun, Hui Xiong
NeurIPS 2025. [paper]
MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Revision
Yuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai, Yonatan Bisk, Olexandr Isayev
EMNLP 2025(Oral) . [paper]
LongvideoHaystack: Re-thinking Temporal Search for Long-Form Video Understandin
Jinhui Ye, Zihan Wang, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025 . [Project Page] [Dataset]
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, Hui Xiong
ICLR 2025(spotlight) . [Openreview]
Improving Gloss-free Sign Language Translation by Reducing Representation Density
Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong
NeurIPS 2024. [arXiv] [code]
Cross-modality Data Augmentation for End-to-End Sign Language Translation
Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Hui Xiong
EMNLP 2023. [arXiv] [code]
Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu
EACL 2023. [paper] [code]
Aspect-Opinion Correlation Aware and Knowledge-Expansion Few Shot Cross-Domain Sentiment Classification
Haopeng Ren, Yi Cai, Yushi Zeng, Jinhui Ye, Ho-fung Leung, Qing Li
Transactions on Affective Computing 2022. [paper] [code]

On Processing Papers

GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang
Preprint 2023. [arXiv]
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye, Junwei Liang
Preprint 2023. [arXiv]

Internship

Awards

• Outstanding Undergraduate Thesis in SCUT
• National Scholarship in China

Jinhui Ye (叶劲辉)

News

Publications

On Processing Papers

Internship

Awards