📙 AI-Infrastructure Index
📚 18 Posts
🕒 Last Updated: Tue May 26 2026
This folder contains AI-Infrastructure-related posts.
| # | Blog Link | Date | Excerpt | Tags |
|---|---|---|---|---|
| 1 | AI-Infrastructure Index | Tue May 26 2026 | 📙 Index of AI-Infrastructure posts | |
| 2 | NVIDIA AI Infrastructure and Operations Fundamentals | Fri Feb 27 2026 | Comprehensive guide to NVIDIA AI infrastructure covering GPU architecture, accelerated computing, training vs inference workloads, data center networking, storage design, virtualization, and operational best practices. | NVIDIA AI Infrastructure GPU Computing CUDA Data Center AI Training AI Inference Networking Storage Virtualization MLOps Certification |
| 3 | AI Infra Computing : GPU, DPU, Virtualization, DGX Systems | Fri Feb 27 2026 | Comprehensive overview of modern AI infrastructure covering CPU, GPU, and DPU architectures, accelerated computing models, cluster scaling, high-speed networking (InfiniBand and RoCE), storage integration, and power and cooling considerations for AI data centers. | NVIDIA CPU Architecture GPU Architecture DPU BlueField Accelerated Computing AI Infrastructure AI Training AI Inference GPU Clusters Data Center InfiniBand RoCE AI Networking Power and Cooling Storage Architecture |
| 4 | AI Programming Model | Fri Feb 27 2026 | Overview of NVIDIA's AI programming model, including core libraries (CUDA, NCCL, cuDNN), training vs inference workloads, and compute scaling models (data parallelism and model parallelism) for AI infrastructure. | NVIDIA AI Infrastructure GPU Clusters Data Center AI Training AI Networking InfiniBand RoCE DPU BlueField Power and Cooling On-Prem vs Cloud Accelerated Computing |
| 5 | Pinned Memory (Page-Locked Memory) in CUDA and GPU Computing | Tue May 26 2026 | Learn how pinned memory (page-locked memory) improves CPU-to-GPU data transfer performance in CUDA, deep learning, and high-performance AI workloads using direct memory access (DMA). | AI CUDA GPU Computing NVIDIA Deep Learning AI Infrastructure High Performance Computing CUDA Memory Pinned Memory Page-Locked Memory DMA AI Training Machine Learning PyTorch TensorFlow |
| 6 | RAPIDS and GPU Accelerated Data Science: cuDF, cuML, CUDA, NCCL and Distributed AI Pipelines | Tue May 19 2026 | Comprehensive overview of the RAPIDS ecosystem covering GPU accelerated DataFrames, machine learning, graph analytics, CUDA execution, distributed computing with Dask and NCCL, TensorRT integration, and large-scale AI data processing pipelines on NVIDIA GPUs. | NVIDIA RAPIDS CUDA cuDF cuML cuGraph CuPy GPU Computing Accelerated Computing Data Science Machine Learning Distributed Computing Dask NCCL TensorRT AI Infrastructure GPU Clusters Data Engineering Vectorized Computing AI Pipelines |
| 7 | TensorRT and High-Performance AI Inference: CUDA, ONNX, TensorRT-LLM and GPU Optimization | Tue May 19 2026 | Comprehensive overview of NVIDIA TensorRT covering ONNX model optimization, CUDA kernel fusion, FP16 and INT8 inference, TensorRT-LLM, GPU memory optimization, Triton Inference Server integration, and production-scale AI inference pipelines on NVIDIA GPUs. | NVIDIA TensorRT TensorRT-LLM CUDA ONNX GPU Inference AI Inference LLM Inference Deep Learning CUDA Kernels FP16 INT8 Quantization Triton Inference Server AI Infrastructure GPU Optimization Accelerated Computing AI Serving Production AI Inference Pipelines |
| 8 | NCCL and Distributed GPU Communication: CUDA, AllReduce, Multi-GPU and AI Cluster Networking | Tue May 19 2026 | Comprehensive overview of NVIDIA NCCL covering GPU-to-GPU communication, AllReduce operations, distributed AI training, CUDA integration, tensor synchronization, multi-node scaling, InfiniBand networking, and high performance communication for large-scale AI and HPC workloads. | NVIDIA NCCL CUDA Distributed Training GPU Communication Multi-GPU AllReduce Tensor Parallelism Pipeline Parallelism AI Infrastructure HPC InfiniBand RoCE GPU Clusters Deep Learning Megatron-LM NeMo TensorRT-LLM Accelerated Computing Parallel Computing |
| 9 | ONNX (Open Neural Network Exchange): Portable AI Models, TensorRT and Cross-Framework Inference | Tue May 19 2026 | Comprehensive overview of ONNX covering portable neural network model formats, cross-framework interoperability, ONNX Runtime, TensorRT integration, GPU accelerated inference, model optimization, and production AI deployment across heterogeneous hardware platforms. | NVIDIA ONNX Open Neural Network Exchange ONNX Runtime TensorRT CUDA AI Inference Deep Learning Model Deployment GPU Inference PyTorch TensorFlow Machine Learning Cross Platform AI AI Infrastructure Accelerated Computing Portable Models LLM Inference Edge AI Production AI |
| 10 | LangChain and AI Agent Orchestration: RAG, LLM Workflows, Vector Databases and Tool Calling | Tue May 19 2026 | Comprehensive overview of LangChain covering AI agents, Retrieval-Augmented Generation (RAG), prompt orchestration, tool calling, memory management, vector databases, multi-step LLM workflows, and production GenAI application development. | LangChain Generative AI AI Agents LLM RAG Retrieval Augmented Generation Vector Databases Prompt Engineering AI Orchestration Tool Calling AI Workflows LangGraph OpenAI LLM Applications AI Infrastructure Semantic Search AI Copilot Workflow Automation Production AI Agentic AI |
| 11 | Megatron-LM and Distributed LLM Training: Tensor Parallelism, NCCL and Trillion-Scale AI Models | Tue May 19 2026 | Comprehensive overview of NVIDIA Megatron-LM covering distributed transformer training, tensor and pipeline parallelism, NCCL communication, CUDA optimization, mixed precision training, trillion-parameter scaling, and large-scale GPU accelerated language model infrastructure. | NVIDIA Megatron-LM CUDA NCCL Distributed Training Tensor Parallelism Pipeline Parallelism Context Parallelism Expert Parallelism LLM Training Transformer Models GPT AI Infrastructure Accelerated Computing Deep Learning Multi-GPU GPU Clusters TensorRT-LLM NeMo Trillion Parameter Models |
| 12 | NVIDIA NeMo and Enterprise AI Platforms: Distributed LLM Training, RAG and TensorRT-LLM | Tue May 19 2026 | Comprehensive overview of NVIDIA NeMo covering large language model training, distributed GPU scaling, Megatron-LM integration, Retrieval-Augmented Generation (RAG), NeMo Retriever, TensorRT-LLM optimization, and enterprise AI deployment pipelines for production-scale generative AI systems. | NVIDIA NeMo CUDA NCCL Megatron-LM TensorRT-LLM Distributed Training LLM Generative AI AI Infrastructure RAG NeMo Retriever AI Agents GPU Clusters Accelerated Computing Enterprise AI Transformer Models Triton Inference Server Deep Learning Production AI |
| 13 | NVIDIA Triton Inference Server: TensorRT-LLM, GPU Serving and Production AI Inference | Tue May 19 2026 | Comprehensive overview of NVIDIA Triton Inference Server covering scalable AI model serving, TensorRT and TensorRT-LLM integration, dynamic batching, multi-model inference, GPU scheduling, Kubernetes deployment, and high-performance production AI serving architectures. | NVIDIA Triton Triton Inference Server TensorRT TensorRT-LLM CUDA AI Inference LLM Serving GPU Inference Dynamic Batching AI Infrastructure Kubernetes Multi-GPU Accelerated Computing Production AI AI APIs Deep Learning GPU Scheduling Inference Optimization Model Serving |
| 14 | NVIDIA NGC Catalog: GPU Optimized Containers, AI Models and Enterprise AI Infrastructure | Tue May 19 2026 | Comprehensive overview of the NVIDIA NGC Catalog covering GPU optimized containers, CUDA and TensorRT environments, NeMo and Triton deployments, pretrained AI models, Kubernetes integration, NVIDIA NIM microservices, and enterprise-scale AI infrastructure for accelerated computing workloads. | NVIDIA NGC NVIDIA NGC Catalog CUDA TensorRT TensorRT-LLM Triton NeMo Kubernetes GPU Containers AI Infrastructure Accelerated Computing NVIDIA NIM GPU Clusters AI Deployment Deep Learning Distributed Computing AI Platform Engineering Production AI Docker |
| 15 | AI Infra Networking: GPU Clusters, InfiniBand, RoCE, and DPU Integration | Fri Feb 27 2026 | Fundamental concepts and technologies for networking in AI-centric data centers, including GPU interconnects (NVLink, NVSwitch), high-speed networking (InfiniBand, RoCE), and the role of DPUs (Data Processing Units) in accelerating AI workloads and managing network traffic. | NVIDIA AI Infrastructure GPU Clusters Data Center AI Training AI Networking InfiniBand RoCE DPU BlueField Power and Cooling On-Prem vs Cloud Accelerated Computing |
| 16 | AI Infra Storage: NVMe, Parallel File Systems, Object Storage, and GPUDirect Storage | Fri Feb 27 2026 | Comprehensive overview of storage architectures for AI infrastructure, covering NVMe, parallel file systems (Lustre, BeeGFS), object storage, and NVIDIA GPUDirect Storage for high-performance data access in AI workloads. | NVIDIA AI Infrastructure GPU Clusters Data Center AI Training AI Networking InfiniBand RoCE DPU BlueField Power and Cooling On-Prem vs Cloud Accelerated Computing |
| 17 | AI/ML Operations | Fri Feb 27 2026 | Comprehensive overview of monitoring and operations for AI infrastructure, covering GPU monitoring tools (DCGM, BCM), infrastructure monitoring (Prometheus, Grafana), cluster orchestration (Kubernetes, Slurm), power and cooling monitoring, high availability, failure scenarios, security monitoring, GPU utilization optimization, capacity planning, multi-GPU scaling strategies, lifecycle management, logging systems, and alerting best practices. | NVIDIA AI Operations GPU Monitoring Data Center Management Cluster Orchestration Kubernetes Job Scheduling GPU Virtualization vGPU MIG Observability MLOps |
| 18 | RIVA |
