Job Title: QA Automation Engineer
Experience Level: – Years About the Role
We are looking for a Quality Assurance Engineer specializing in Large Language Models
(LLMs) to ensure the accuracy, reliability, and performance of AI-driven applications.
The ideal candidate has a strong understanding of how LLMs interact with data
pipelines—covering indexing, chunking, embeddings, cosine similarity and keyword
search —along with hands-on experience in LLM observability, prompt evaluation, and
QA automation.
Key Responsibilities
• Design and execute QA strategies for LLM-based and search-driven products.
• Validate data pipelines involving indexing, chunking, embeddings, cosine
similarity and keyword search.
• Evaluate retrieval-augmented generation (RAG) and recommendation system
quality using precision, recall, and relevance metrics.
• Develop prompt test suites to measure LLM accuracy, consistency, and bias.
• Monitor LLM observability metrics such as latency, token usage, hallucination
rate, and cost performance.
• Automate end-to-end test scenarios using Playwright and integrate with CI/CD
pipelines.
• Collaborate with ML engineers and developers to improve model responses and
user experience.
• Contribute to test frameworks and datasets for LLM regression and benchmark
testing.
Required Skills & Experience
• 4+ years of experience in QA engineering, with at least 1+ year in GenAI or LLM
based systems.
• Strong understanding of indexing, chunking, embeddings, similarity search,
and retrieval workflows.
• Experience with prompt engineering, LLM evaluation, and output validation
techniques.
• Proficiency with Playwright, API automation, and modern QA frameworks.
• Knowledge of observability tools for LLMs
• Solid scripting experience in Python.
• Knowledge of different LLM providers (OpenAI, Gemini, Anthropic, Mistral, etc.)
• Exposure to RAG pipelines, recommendation systems, or model
performance benchmarking.
• Strong analytical and debugging skills, with a detail-oriented mindset.