Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. work
  4. ›
  5. …

  6. ›
  7. 1 ai infrastructure llm platform

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

    AI & Machine Learning
    • AI Infrastructure & LLM Platform

    • Model Gym

    • RAG Factory

    • NVIDIA Super POD

    • GPU Fabric Bench

    • Prompt Bridge


    Cloud & DevOps

    Full-Stack Applications

    Mobile Development

Cover Image for AI Infrastructure & LLM Platform
AI & Machine Learning

AI Infrastructure & LLM Platform

Thoughtworks (Multi-Client Engagement)

2026 – Present

Senior Consultant

AI Infrastructure & LLM

Tech Stack
LangChain / LangGraph
LLM API Integration
AWS EKS / Kubernetes
Terraform
Kafka / SQS
Prometheus / Grafana / Loki
SLO/SLA Alerting
GitHub Actions

Summary

Integrated LLM and agentic workflows into production microservices while building the cloud, observability, and CI/CD infrastructure underneath them.


What I Built

Project Overview

As part of Thoughtworks' AI and Platform Engineering practice, I help enterprise clients design, deploy, and operationalize Large Language Model (LLM) applications and agentic systems in production environments.

My role bridges platform engineering, cloud infrastructure, and AI application development. I work directly with engineering teams, architects, and business stakeholders to transform experimental AI initiatives into scalable, secure, and observable production systems.

The work spans infrastructure provisioning, Kubernetes operations, agent orchestration, model deployment, observability, CI/CD automation, and enterprise AI adoption across multiple industries.


Key Features

Enterprise LLM Deployments

Designed and deployed production-ready LLM applications capable of supporting internal copilots, knowledge assistants, and workflow automation platforms.

Agentic AI Systems

Implemented LangChain and LangGraph workflows that orchestrate tools, APIs, retrieval systems, and multi-step reasoning processes.

AI Platform Engineering

Built reusable cloud-native infrastructure enabling rapid deployment and scaling of AI workloads across customer environments.

Observability for AI Systems

Established monitoring, logging, tracing, and SLO frameworks for AI applications to improve reliability, performance, and operational visibility.

MLOps & Model Lifecycle

Implemented deployment workflows, model versioning strategies, evaluation pipelines, and automated release processes for AI services.


My Contributions

  • Designed and deployed Kubernetes-based AI platforms supporting LLM inference and agentic workloads.
  • Integrated commercial and open-source LLM providers into enterprise applications.
  • Built LangChain and LangGraph agent workflows for retrieval, automation, and decision-support use cases.
  • Developed retrieval-augmented generation (RAG) architectures connecting enterprise knowledge sources to LLM applications.
  • Implemented observability frameworks using Prometheus, Grafana, Loki, and distributed tracing solutions.
  • Automated infrastructure provisioning through Terraform and GitHub Actions.
  • Built event-driven architectures using Kafka and AWS messaging services.
  • Established CI/CD pipelines for AI applications and supporting microservices.
  • Worked directly with client teams to evaluate AI adoption strategies and production-readiness requirements.
  • Supported security, governance, RBAC, and compliance controls for enterprise AI deployments.

Technical Highlights

Forward Deployed AI Engineering

Partnered directly with enterprise clients to design, implement, and operationalize AI systems tailored to real-world business workflows.

Agentic Workflow Orchestration

Built complex multi-step workflows capable of combining LLM reasoning, external tools, enterprise APIs, and retrieval systems.

Production AI Infrastructure

Designed cloud-native platforms supporting scalable inference workloads, deployment automation, and operational reliability.

AI Observability

Implemented monitoring and evaluation frameworks enabling teams to understand latency, token usage, system health, and application performance.

Developer Platform Automation

Created reusable infrastructure and deployment patterns that accelerated AI application onboarding across multiple customer engagements.


Challenges & Solutions

Challenge

Many organizations successfully prototype AI solutions but struggle to operationalize them due to infrastructure complexity, reliability concerns, observability gaps, and governance requirements.

Solution

Built standardized AI platform foundations combining Kubernetes, Infrastructure as Code, observability frameworks, agent orchestration, and automated deployment pipelines.

Outcome

Enabled enterprise teams to move from AI experimentation to production deployment faster while maintaining security, scalability, operational visibility, and engineering best practices.


Technology Stack

AI Frameworks LangChain, LangGraph

LLMs OpenAI, Anthropic, Mistral, Open-Source Models

Retrieval Vector Databases, RAG Pipelines, Embeddings

Cloud AWS, Kubernetes, EKS

Infrastructure Terraform, Docker, Helm

Messaging Kafka, SQS

Observability Prometheus, Grafana, Loki, OpenTelemetry

Automation GitHub Actions, CI/CD Pipelines

Domain AI Infrastructure, Agentic AI, Enterprise AI, Forward Deployed Engineering

Next →

Model Gym

Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.