AI/LLM Data Engineer

Posted on February 26, 2025

Apply Now

Job Description

AI/LLM Data Engineer
Experience: 3-5 Years
Remote
Role Overview:
We are seeking an AI/LLM Data Engineer to build and maintain data pipelines for our Generative AI platform. This position requires expertise in Large Language Model (LLM) technologies and a strong background in data engineering with a focus on Retrieval-Augmented Generation (RAG) and knowledge base techniques. The role involves collaborating with cross-functional teams and working on high-impact AI projects.
Key Responsibilities:
● Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including:
○ Supervised Fine Tuning (SFT) processes
○ Reinforcement Learning from Human Feedback (RLHF)
● Evaluate and integrate diverse data sources to support Generative AI platforms
● Develop and optimise workflows for:
○ Chunking, indexing, ingestion, and vectorization of text and non-text data
● Benchmark and implement various vector stores, embedding techniques, and retrieval methods
● Build a flexible pipeline that supports multiple embedding algorithms, vector stores, and search types (vector search, hybrid search)
● Implement and maintain auto-tagging systems and data preparation processes
● Develop tools for text and image data crawling, cleaning, and refinement
● Collaborate with teams to ensure data quality and relevance for AI/ML models
● Work with data lakehouse architectures to optimize data storage and processing
● Integrate Snowflake and vector store technologies to optimize workflows
Required Qualifications:
● Education: Master's degree in Computer Science, Data Science, or a related field
● Experience:
○ 3-5 years of work experience in data engineering, with a focus on AI/ML
○ Hands-on experience with data cleaning, tagging, annotation, and data crawling
● Skills:
○ Proficiency in Python, JSON, HTTP, and related tools
○ Strong understanding of LLM architectures, training processes, and data requirements
○ Experience with RAG systems, knowledge base construction, and vector databases
○ Familiarity with embedding techniques, similarity search algorithms, and information retrieval
○ Experience with data lakehouse concepts and architectures
○ Knowledge of Snowflake and its integration in AI/ML pipelines
○ Hands-on experience with vector store technologies and their applications in AI
○ Collaborative communication skills, with the ability to work in a cross-functional team environment
○ Ability to translate business needs into technical solutions
○ Passion for innovation and ethical AI development
Preferred Qualifications:
● Experience with LLM/RAG frameworks such as LangChain, LlamaIndex, Semantic Kernel, or OpenAI Functions
● Familiarity with distributed computing platforms (e.g., Apache Spark, Dask)
● Knowledge of data versioning and experiment tracking tools
● Cloud platforms experience (AWS, GCP, Azure) for large-scale data processing
● Understanding of data privacy and security best practices
● Experience implementing data lakehouse solutions
● Proficiency in optimising queries and data processes in Snowflake or Databricks
● Experience with different LLM parameters (temperature, top-k, repeat penalty) and evaluation metrics

Required Skills

ai/ml engineeer

Recruiter: Divyang Yadav

Company: The AI Matters

Chat on WhatsApp

Key Details

Job Type full-time

Location Type remote

Location N/A

Experience 4+ years

Salary Range INR 75,000 - 80,000 / monthly

Application Deadline February 28, 2025