My job alerts

ML Evaluation & Insights Engineer - ASE

Cellular Vehicles

This job is no longer accepting applications

See open jobs at Cellular Vehicles.See open jobs similar to "ML Evaluation & Insights Engineer - ASE" MedTech Innovator.

Software Engineering, Data Science

Singapore

Posted on Apr 2, 2026

Summary

Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish!

Apple Services Engineering (ASE) powers many AI features across App Store, Music, Video and more. We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating systemic biases and maintain safe and trustworthy experiences across our AI tools and models.

Description

Our team, part of Apple Services Engineering, is looking for an ML Research Engineer to lead the design and continuous development of automated benchmarking methodologies. In this role, you will investigate the behavior of media-related agents, craft rigorous evaluation frameworks and techniques, and establish scientific standards for assessing quality features. This role supports the development of scalable evaluation techniques that ensure our engineers have the right tools to assess candidate models and product features for optimal performance. The capabilities you build will allow for the generation of benchmark datasets and evaluation methodologies for model and application outputs, at scale, to enable engineering teams to translate insights into actionable engineering and product improvements. This role blends deep technical expertise with strong analytical judgment to develop tools and capabilities for assessing and improving the behavior of advanced AI/ML models. You will work cross-functionally with Engineering and Project Managers, Product, Safety, and Editorial teams to develop a suite of technologies to ensure that AI experiences are reliable, safe, and aligned with human expectations. The successful candidate will take a proactive approach to working independently and collaboratively on a wide range of projects. In this role, you will work alongside an impactful team, collaborating with ML and data scientists, software developers, project managers, and other teams at Apple to understand requirements and translate them into scalable, reliable, and efficient evaluation frameworks.

Responsibilities

Design scientifically-based benchmarking methodologies that evaluate multiple dimensions of feature quality across various media and application marketplace use cases
Develop automated evaluation pipelines that collect, automatically assess, and analyze model outputs on a large scale
Create and curate datasets, tasks, and feature usage scenarios that represent realistic and adversarial use cases across multiple languages, markets, and domains
Define and validate new metrics for complex phenomena such as multi-turn agentic interaction patterns
Apply statistical rigor and reproducibility to above mentioned objectives
Work closely with engineering and research teams to translate experimental findings into actionable model improvements and mitigations
Publish internal reports and external papers
Monitor evolving industry practices and academic work to ensure benchmarks remain relevant
Communicate clearly with partnering teams

Minimum Qualifications

Advanced degree (MS or PhD) in Computer Science, Software Engineering, or equivalent research/work experience
Min 1+ years of work experience either as a postdoc or in the industry
Strong research background in empirical evaluation, experimental design, or benchmarking
Strong proficiency in Python (pandas, NumPy, Jupyter, PyTorch, etc.)
Deep familiarity with software engineering workflows and developer tools
Experience working with or evaluating AI/ML models, preferably LLMs or program synthesis systems
Strong analytical and communication skills, including the ability to write clear reports
Technical Skills:
Experience working with large datasets, annotation tools, and model evaluation pipelines
Familiarity with evaluations specific to responsible AI and safety, hallucination detection, and/or model alignment concerns
Ability to design taxonomies, categorization schemes, and structured labeling frameworks
Analytical Strength: Ability to interpret unstructured data (text, transcripts, user sessions) and derive meaningful insights
Communication: Strong ability to stitch together qualitative and quantitative insights into actionable guidance; strong ability to communicate complex architectures and systems to a variety of stakeholders
Education in Data Science, Linguistics, Cognitive Science, HCI, Psychology, Social Science, or a related field
Fluent in English and either Korean, Chinese, Japanese, French, Spanish, Portuguese, Hindi, Tamil.

Preferred Qualifications

Publications in AI/ML evaluation or related fields
Experience with automated testing frameworks
Experience constructing human-in-the-loop or multi-turn evaluation setups
Intermediate or Advanced Proficiency in Swift
Familiarity with RAG systems, reinforcement learning, agentic architectures, and model fine-tuning
Expertise in designing annotation guidelines and validation instruments and techniques
Background in human factors, social science, and/or safety assessment methodologies

Apple is an equal opportunity employer that is committed to inclusion and diversity. Apple provides reasonable accommodations to applicants with disabilities and in accordance with local requirements. Apple is a drug-free workplace.

This job is no longer accepting applications

See open jobs at Cellular Vehicles.See open jobs similar to "ML Evaluation & Insights Engineer - ASE" MedTech Innovator.

See more open positions at Cellular Vehicles

Portfolio Company Careers

ML Evaluation & Insights Engineer - ASE