Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 Programming

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for AI Programming Model

AI Programming Model

AI infrastructure fundamentals covering GPU hardware selection, cluster scaling, power and cooling design, networking, high-speed interconnects, and DPU integration for modern data centers.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

Core Libraries & Frameworks

1. CUDA (Compute Unified Device Architecture)

Parallel computing platform enabling GPU programming.

  • Thousands of parallel threads
  • Native C/C++/Python integration
  • General-purpose GPU computing

CUDA parallel model:

  • Break problem into small identical tasks
  • Launch thousands of threads (workers) to do them simultaneously, Collect results when everyone finishes

2. NCCL (NVIDIA Collective Communications Library)

Collective communication library

  • Used by PyTorch & TensorFlow
  • Optimizes:
    • All-reduce
    • Broadcast
    • Synchronization across GPUs

Training vs Inference

AI Workflow:

 Data Preperation 
  |--> Model Training 
     |--> Optimization 
         |--> Inference/Deployment

Model Training

compute intensive

  • Forward + backward pass
  • Multi-GPU scaling
  • High memory + compute demand
  • Uses NCCL, NVLink, RDMA

Model Inference

latency optimized

  • Forward pass only
  • Lower latency focus
  • Often containerized (Kubernetes)
TrainingInference
Model learningModel usage
High compute + memoryLower latency focus
Batch workloadsReal-time workloads
Multi-GPU scalingEdge + cloud deployment

Compute Scaling Models

1. Data Parallelism

  • Same model on multiple GPUs
  • Split dataset across GPUs

2. Model Parallelism

  • Model split across GPUs
  • Used for very large models

AI-ML/5-Programming
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.