Google Scholar Github Email: kifish.pro@gmail.com
Experience
- 2023.04-present LLM Researcher at ByteDance Seed LLM
- 2021.07-2023.03 NLP algorithm engineer at Kuaishou MMU
Publications
2025
Kai Hua, Steven Wu, Ge Zhang. AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection. arXiv:2505.07293, 2025.05
- LLM Pretrain-data Selection (Idea Originator && Project Leader)
- We propose AttentionInfluence, a training-free and supervision-free method for reasoning-centric data selection. By masking attention heads in a small pretrained model and measuring loss differences, we identify reasoning-intensive data that significantly improves the performance of larger models. Applied to a 7B model, our approach yields consistent gains on benchmarks like MMLU, GSM8K, and HumanEval—demonstrating an effective weak-to-strong scaling path for reasoning-focused pretraining.
- Twitter 量子位 Community reproduction
Seed LLM Team. Seed OSS 36B, Open Source Model, 2025.08
- LLM Code/Pretrain (Team Collaboration)
- Led the text mid-training and long-context(128K/512K) CT
- 量子位
Seed LLM&VLM Team. Seed-1.6, Technical Blog, 2025.06
- LLM&VLM Pretrain (Team Collaboration)
- Led the multimodal long-context(128K/512K) CT
- 机器之心
Seed VLM&LLM Team. Seed1.5-VL Technical Report. arXiv:2505.07062, 2025.05.
- LLM&VLM Pretrain (Team Collaboration)
- Led the text long-context(128K/512K) CT
- 机器之心
Seed LLM Team. Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning. arXiv:2504.13914. 2025.04
- LLM Pretrain (Team Collaboration)
- Core contributor for pretraining data
- 量子位
2024
- Chongyang Tao, Tao Shen, Shen Gao, Junshuo Zhang, Zhen Li, Kai Hua, Zhengwei Tao, and Shuai Ma. Llms are also effective embedding models: An in-depth overview. arXiv preprint arXiv:2412.12591, 2024.12
2020
- Kai Hua, Zhiyuan Feng, Chongyang Tao, Rui Yan, Lu Zhang. Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), 2020.10