Senior LLM Evaluation Researcher - TikTok
TikTok · зарплата не указана · San Jose, California, United States of America · сайт компании · опубликовано 2 июня 2026 г.
Описание вакансии
About the Team:
We are looking for a passionate and detail-oriented specialist to join our AI experience team. In this role, you will be responsible for defining and driving the evaluation framework for Tako's AI-powered features, ensuring our large language model (LLM) responses meet the highest standards of quality, relevance, and user satisfaction.
Responsibilities
- Develop a deep understanding of LLM capabilities and stay current with the latest research paradigms in model evaluation; apply both qualitative and quantitative user research methodologies to explore and define the ideal response quality standard for AI in diverse use cases.
- Own the end-to-end online experience quality of Tako; design and build a comprehensive evaluation framework by integrating internal expert assessments, crowdsourced testing, and LLM-based automated evaluation; identify experience gaps and translate findings into prioritized, actionable improvement recommendations for the team.
- Collaborate with international operations teams to drive the execution of evaluation programs, including the maintenance and curation of evaluation datasets, as well as the routine execution and analysis of benchmark assessments.
Requirements:
Minimum Qualification(s):
- 3+ years of hands-on experience in LLM evaluation or related research; proven track record in designing AI quality assessment frameworks;
- Highly organized and detail-oriented with strong logical thinking; excellent communication and collaboration skills with the ability to work effectively across cross-functional teams.
Preferred Qualification(s):
- Master's degree or above; background in Cognitive Science, Educational Measurement, or quantitative Social Science research is preferred.
- Experience in user research and data analysis projects is a plus.
- High level of self-motivation and ability to work in a fast-paced environment.