Senior AI Engineer, Document Intelligence & LLM Infrastructure

Posted on February 24, 2026

Apply Now

Job Description

About the Role:
Duration : 6 months
Location: Remote
Timings: Full Time (40 hrs per week) - 5.30pm to 2.30am
Notice Period: (Immediate Joiner - Only)
Role Overview
We are hiring a senior individual contributor to own and scale large-scale, OCR-driven document intelligence systems powered by self-hosted LLMs.
This is a deeply hands-on engineering role focused on production systems that process long-form documents (200+ pages), extract structured data deterministically, and run on optimized GPU-backed inference infrastructure.
You will work closely with the AI leadership team but will independently own architecture, performance, and reliability of document processing pipelines.
Core Responsibilities
1. Large-Scale Document Intelligence Pipelines
Design and build end-to-end pipelines for processing long-form, OCR-heavy documents
Own PDF ingestion, layout-aware parsing, and multi-page document assembly
Implement robust chunking, segmentation, and metadata tracking across long documents
Handle exception detection, retries, and deterministic failure handling
Optimize systems to reliably process 200+ page documents at scale


2. OCR & Structured Extraction Systems
Work with OCR engines (Tesseract, PaddleOCR, layout-aware models, vision-language models)
Build layout-aware extraction systems using bounding boxes and structural metadata
Implement deterministic schema validation and cross-field consistency checks
Reduce reliance on manual QA through rule-based validation layers
Ensure traceability from extracted field back to source span


3. Self-Hosted LLM Inference (Production Ownership)
Deploy and operate open-source LLMs using:
vLLM
Hugging Face TGI
GPU-backed serving stacks
Tune inference performance:
KV cache management
Batching
Context window control
Throughput vs latency trade-offs
Monitor and optimize GPU utilization and cost per request
Own production reliability of LLM serving infrastructure


4. Deterministic Validation & Control Systems
Design validation layers outside the LLM
Implement schema enforcement, rule engines, invariants, and rejection logic
Build automated exception routing without default human review
Ensure auditability and reproducibility of extraction results
Create measurable correctness guarantees for high-stakes use cases


5. Production Engineering & Scale
Design systems that handle:
Large document volumes
Concurrency
Failure states
Observability and monitoring
Build logging, tracing, and metrics around document processing pipelines
Collaborate with cross-functional teams to ship production-grade AI systems
Required Experience
6+ years of hands-on Python engineering
Proven production experience building OCR-driven document pipelines
Experience handling long-form PDFs (100+ pages)
Strong experience with:
vLLM or Hugging Face TGI
GPU-based LLM serving
Open-source LLMs (LLaMA, Qwen, Mistral, etc.)
Experience building deterministic validation systems (schema + rule enforcement)
Strong debugging and systems-level thinking
Ability to clearly articulate system trade-offs and business impact
Strongly Preferred
Experience with layout-aware models (LayoutLM, DocFormer, vision-language models)
Experience optimizing GPU cost and inference performance
Experience in regulated domains (healthcare, finance, compliance)
Familiarity with document-heavy workflows such as loan processing, underwriting, or claims


Notes:
Rates will be calculated based on Working Hours.

Required Skills

ocr document intelligence ocr & structured llm

Clarification Board

Your Clarifications
"Send your Job Related Query - you'll get a reply soon."