kifish's blog

Google Scholar Github Email: kifish.pro@gmail.com

Experience

2023.04-present LLM Researcher at ByteDance Seed LLM
2021.07-2023.03 NLP algorithm engineer at Kuaishou MMU

Publications

2025

Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian. Scaling Latent Reasoning via Looped Language Models . arXiv:2510.25741, 2025.10
- Great Team Collaboration
- We scale up Looped Language Models to 2.6 billion parameters and complete pretraining on 7.7 trillion open-source tokens following a multi-stage data recipe encompassing Pretraining, Continual Training (CT), Long-CT, and Mid-Training. The resulting model is on par with SOTA language models of 2–3× size. We open source all the model weights and the data recipe.
- I design and curate all pretraining data mixtures utilizing open-source data and provide key insights throughout the pretraining process.
- Project Page arXiv Twitter Hugging Face 机器之心
Kai Hua, Steven Wu, Ge Zhang. AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection. arXiv:2505.07293, 2025.05
- LLM Pretrain-data Selection (Idea Originator && Project Leader)
- We propose AttentionInfluence, a training-free and supervision-free method for reasoning-centric data selection. By masking attention heads in a small pretrained model and measuring loss differences, we identify reasoning-intensive data that significantly improves the performance of larger models. Applied to a 7B model, our approach yields consistent gains on benchmarks like MMLU, GSM8K, and HumanEval—demonstrating an effective weak-to-strong scaling path for reasoning-focused pretraining.
- arXiv Twitter 量子位 Community Reproduction
Seed Model&LLM&VLM Team. Seed-VWN, Technical Report, 2025.11
- Model&LLM&VLM (Team Collaboration)
- Provide long-context(128K/512K) CT data and long-context evaluation
- arXiv
Seed LLM Team. Seed OSS 36B, Open Source Model, 2025.08
- LLM Code/Pretrain (Team Collaboration)
- Led the text mid-training and long-context(128K/512K) CT
- Hugging Face 量子位
Seed LLM&VLM Team. Seed-1.6, Technical Blog, 2025.06
- LLM&VLM Pretrain (Team Collaboration)
- Led the multimodal long-context(128K/512K) CT
- Technical Blog 机器之心
Seed VLM&LLM Team. Seed1.5-VL Technical Report. arXiv:2505.07062, 2025.05.
- LLM&VLM Pretrain (Team Collaboration)
- Led the text long-context(128K/512K) CT
- arXiv 机器之心
Seed LLM Team. Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning. arXiv:2504.13914. 2025.04
- LLM Pretrain (Team Collaboration)
- Core contributor for pretraining data
- arXiv 量子位

2024

Chongyang Tao, Tao Shen, Shen Gao, Junshuo Zhang, Zhen Li, Kai Hua, Zhengwei Tao, and Shuai Ma. Llms are also effective embedding models: An in-depth overview. arXiv preprint arXiv:2412.12591, 2024.12
arXiv

2020

Kai Hua, Zhiyuan Feng, Chongyang Tao, Rui Yan, Lu Zhang. Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), 2020.10
arXiv

📈 Page Views: