QA LLM Engineer (3-7 Years)

Talentica Software

3 - 7 Years
1 Openings
Not Disclosed
Posted 46 days ago
Online interview
Pune

Key Skills

QA-quality Assurance AI Testing LLM Tester

Apply

Job Alert Similar Jobs Share

Job Description

Job Title: QA Automation Engineer

Experience Level: – Years About the Role

We are looking for a Quality Assurance Engineer specializing in Large Language Models

(LLMs) to ensure the accuracy, reliability, and performance of AI-driven applications.

The ideal candidate has a strong understanding of how LLMs interact with data

pipelines—covering indexing, chunking, embeddings, cosine similarity and keyword

search —along with hands-on experience in LLM observability, prompt evaluation, and

QA automation.

Key Responsibilities

• Design and execute QA strategies for LLM-based and search-driven products.

• Validate data pipelines involving indexing, chunking, embeddings, cosine

similarity and keyword search.

• Evaluate retrieval-augmented generation (RAG) and recommendation system

quality using precision, recall, and relevance metrics.

• Develop prompt test suites to measure LLM accuracy, consistency, and bias.

• Monitor LLM observability metrics such as latency, token usage, hallucination

rate, and cost performance.

• Automate end-to-end test scenarios using Playwright and integrate with CI/CD

pipelines.

• Collaborate with ML engineers and developers to improve model responses and

user experience.

• Contribute to test frameworks and datasets for LLM regression and benchmark

testing.

Required Skills & Experience

• 4+ years of experience in QA engineering, with at least 1+ year in GenAI or LLM

based systems.

• Strong understanding of indexing, chunking, embeddings, similarity search,

and retrieval workflows.

• Experience with prompt engineering, LLM evaluation, and output validation

techniques.

• Proficiency with Playwright, API automation, and modern QA frameworks.

• Knowledge of observability tools for LLMs

• Solid scripting experience in Python.

• Knowledge of different LLM providers (OpenAI, Gemini, Anthropic, Mistral, etc.)

• Exposure to RAG pipelines, recommendation systems, or model

performance benchmarking.

• Strong analytical and debugging skills, with a detail-oriented mindset.