AI Infrastructure & LLM Platform
Thoughtworks (Multi-Client Engagement)
2026 – Present
Senior Consultant
AI Infrastructure & LLM
Tech Stack
Summary
Integrated LLM and agentic workflows into production microservices while building the cloud, observability, and CI/CD infrastructure underneath them.
What I Built
Project Overview
As part of Thoughtworks' AI and Platform Engineering practice, I help enterprise clients design, deploy, and operationalize Large Language Model (LLM) applications and agentic systems in production environments.
My role bridges platform engineering, cloud infrastructure, and AI application development. I work directly with engineering teams, architects, and business stakeholders to transform experimental AI initiatives into scalable, secure, and observable production systems.
The work spans infrastructure provisioning, Kubernetes operations, agent orchestration, model deployment, observability, CI/CD automation, and enterprise AI adoption across multiple industries.
Key Features
Enterprise LLM Deployments
Designed and deployed production-ready LLM applications capable of supporting internal copilots, knowledge assistants, and workflow automation platforms.
Agentic AI Systems
Implemented LangChain and LangGraph workflows that orchestrate tools, APIs, retrieval systems, and multi-step reasoning processes.
AI Platform Engineering
Built reusable cloud-native infrastructure enabling rapid deployment and scaling of AI workloads across customer environments.
Observability for AI Systems
Established monitoring, logging, tracing, and SLO frameworks for AI applications to improve reliability, performance, and operational visibility.
MLOps & Model Lifecycle
Implemented deployment workflows, model versioning strategies, evaluation pipelines, and automated release processes for AI services.
My Contributions
- Designed and deployed Kubernetes-based AI platforms supporting LLM inference and agentic workloads.
- Integrated commercial and open-source LLM providers into enterprise applications.
- Built LangChain and LangGraph agent workflows for retrieval, automation, and decision-support use cases.
- Developed retrieval-augmented generation (RAG) architectures connecting enterprise knowledge sources to LLM applications.
- Implemented observability frameworks using Prometheus, Grafana, Loki, and distributed tracing solutions.
- Automated infrastructure provisioning through Terraform and GitHub Actions.
- Built event-driven architectures using Kafka and AWS messaging services.
- Established CI/CD pipelines for AI applications and supporting microservices.
- Worked directly with client teams to evaluate AI adoption strategies and production-readiness requirements.
- Supported security, governance, RBAC, and compliance controls for enterprise AI deployments.
Technical Highlights
Forward Deployed AI Engineering
Partnered directly with enterprise clients to design, implement, and operationalize AI systems tailored to real-world business workflows.
Agentic Workflow Orchestration
Built complex multi-step workflows capable of combining LLM reasoning, external tools, enterprise APIs, and retrieval systems.
Production AI Infrastructure
Designed cloud-native platforms supporting scalable inference workloads, deployment automation, and operational reliability.
AI Observability
Implemented monitoring and evaluation frameworks enabling teams to understand latency, token usage, system health, and application performance.
Developer Platform Automation
Created reusable infrastructure and deployment patterns that accelerated AI application onboarding across multiple customer engagements.
Challenges & Solutions
Challenge
Many organizations successfully prototype AI solutions but struggle to operationalize them due to infrastructure complexity, reliability concerns, observability gaps, and governance requirements.
Solution
Built standardized AI platform foundations combining Kubernetes, Infrastructure as Code, observability frameworks, agent orchestration, and automated deployment pipelines.
Outcome
Enabled enterprise teams to move from AI experimentation to production deployment faster while maintaining security, scalability, operational visibility, and engineering best practices.
Technology Stack
AI Frameworks LangChain, LangGraph
LLMs OpenAI, Anthropic, Mistral, Open-Source Models
Retrieval Vector Databases, RAG Pipelines, Embeddings
Cloud AWS, Kubernetes, EKS
Infrastructure Terraform, Docker, Helm
Messaging Kafka, SQS
Observability Prometheus, Grafana, Loki, OpenTelemetry
Automation GitHub Actions, CI/CD Pipelines
Domain AI Infrastructure, Agentic AI, Enterprise AI, Forward Deployed Engineering
