Offered by Department of Mechanical Engineering, IIT Bombay

Certificate in AI Engineering and MLOps

Application closes on 25th May 2026

Trusted by millions of learners

Learn more about the course

Get details on syllabus, projects, tools, and more

Name

Mobile Number

By submitting this form, you consent to our Terms of Use & Privacy Policy and consent to be contacted via email, phone (including by AI-generated/pre-recorded voice calls), SMS, or WhatsApp.

Certificate in AI Engineering and MLOps

Application closes 25th May 2026

Key Learning Outcomes

Build and Scale AI Systems with MLOps

Build, scale and manage AI systems with advanced engineering and MLOps skills

Design parallel ML algorithms using shared and distributed systems to maximise computational throughput
Package AI apps in containers and orchestrate deployments at scale using modern platforms for consistency
Build scalable data pipelines and I/O systems to process massive datasets without slowing compute
Optimise AI workloads by identifying bottlenecks and improving performance in distributed systems
Build CI/CD pipelines, tracking and monitoring systems to manage end-to-end AI lifecycle in production

Earn a Certificate of Completion from IIT Bombay

#28

QS Rankings in Engineering & Technology, 2025
#3

NIRF India Rankings 2024
#1

NIRF India Innovation Rankings, 2024
#3

NIRF India Engineering Rankings, 2024
#30

QS Rankings in Data Science and AI, 2024
#63

QS Rankings in Electrical & Electronics, 2024
#2 in India

QS World University Rankings, 2026

Key Certificate Highlights

Why choose this Certificate?

IIT Bombay faculty-led

Learn from IIT Bombay faculty through a mix of live, online, interactive sessions who will connect cutting-edge research with practical frameworks and cases.
Weekly live sessions

Weekly live sessions for learning, hands-on skills and query resolution
Hands-on, Industry-focused Curriculum

Learn through real-world projects using industry-standard tools and technologies.
Peer Networking

Benefit from peer-to-peer learning and networking opportunities with professionals from diverse backgrounds.
Campus immersion at IIT Bombay

Meet the faculty and experience the IIT Bombay campus during the campus immersion
Dedicated learner support

Personalised assistance from a dedicated Programme Manager

Skills you will learn

Parallel Algorithm Design

Distributed Training Deployment

Containerized Workflow Orchestration

Scalable Data Pipeline Engineering

System Performance Optimization

Hybrid AI Infrastructure Architecture

Production MLOps Implementation

Large Model Training

Hardware-Aware Programming

Resource Management

Production Monitoring

Advanced AI Trends

Parallel Algorithm Design

Distributed Training Deployment

Containerized Workflow Orchestration

Scalable Data Pipeline Engineering

System Performance Optimization

Hybrid AI Infrastructure Architecture

Production MLOps Implementation

Large Model Training

Hardware-Aware Programming

Resource Management

Production Monitoring

Advanced AI Trends

This certificate is ideal for

Professionals aiming to build, scale and deploy AI systems using HPC, MLOps and distributed computing.

Data and AI Professionals

Who want to scale AI and ML implementation using high-performance computing to train, manage datasets and deploy AI solutions.
Software Development and Engineering professionals

Moving into high-impact AI/ML roles, learning distributed training, GPU optimisation and parallel programming for modern AI architectures.
Cloud, DevOps and IT professionals

Who aim to extend their infrastructure expertise into AI/ML workloads, support ML deployment by monitoring and CI/CD pipelines.
Technology consultants and technical managers

Who evaluate and implement scalable AI platforms and drive the deployment of ML workloads across cloud computing and hybrid environments.

Experience a unique learning journey

Weekly live sessions

Interactive classes for concept clarity, hands-on and Q&A with IIT Bombay faculty.
Peer-to-peer networking

Learn with a cohort - discuss and share ideas in class and in discussion forums.
Industry-focussed curriculum

Work on projects - apply concepts & tools to real use cases
Get personalized assistance

Our dedicated programme managers will support you whenever you need

Comprehensive Curriculum

The curriculum is structured into four instructional modules, each followed by hands-on projects to ensure learners can apply the technical concepts learnt during the module to practical infrastructure problems.

Module 1: Fundamentals of AI, Machine Learning and High-Performance Computing (HPC) for AI Engineering

This module begins with the foundations of AI and machine learning, hardware-aware programming, and the fundamentals of parallel computing. Before moving to distributed systems, an AI Engineer must first understand the efficiency of individual compute nodes. You will learn how different hardware architectures, such as central and graphical processing units, impact the performance of AI training loops. By understanding memory hierarchies and cache optimisation, you will learn to identify and resolve bottlenecks in standard machine learning workloads. The module then covers shared memory parallelisation, teaching you to utilise multi-core processors effectively for matrix operations and feature engineering. Further, the module introduces Message Passing Interface (MPI) for distributed ML systems and addresses a key element of AI Engineering: enabling multiple machines to function as a single, cohesive training unit using advanced communication patterns and data parallelism.

Topics Covered:

• Fundamentals of AI and Machine Learning • Fundamentals of High Performance Computing (HPC) for AI and ML Workloads, Parallel Computing, Profiling ML Workloads • OpenMP for ML Parallelisation, Parallelising ML Kernels, Performance Optimisation • Message Passing Interface (MPI) for Distributed ML Systems, Data Parallelism, Collective Communication, Non-blocking Communication, Advanced Collectives, Hybrid Parallelisation Strategies, Communication Patterns

Module 2: Containerisation, Orchestration and Management of Distributed ML Systems

This module shifts focus to orchestrating multi-node clusters and managing containerisation, once you understand single-node performance and the Message Passing Interface (MPI). You will explore how containerisation ensures reproducibility across diverse environments and how orchestration tools manage these containers at scale. This module covers resource management and job scheduling, essential for navigating shared high-performance computing clusters. Furthermore, you will also learn how to leverage distributed deep learning frameworks to run ML workloads across a network of multiple nodes using MPI and data parallelism.

Topics Covered:

• Containerisation for High Performance Computing (HPC), Strategies for Reproducible AI Environments • Orchestration and Management of Large-Scale Containerised ML Workloads • Resource Management, Job Scheduling, Workflow Management in Shared HPC Clusters • Distributed Deep Learning Frameworks, Data Parallelism

Module 3: Training Large Models

This module is at the core of AI Engineering and explores the specific infrastructure requirements for large-scale model training, such as Large Language Models (LLMs). You will learn various model parallelism strategies, including tensor and pipeline parallelism, to handle models that exceed the memory of a single GPU. To support the high-speed parallelised models, it is critical to have data engineering systems operating at scale. Hence, a significant portion of this module is dedicated to building high-performance data pipelines—understanding how parallel file systems and high-performance I/O libraries prevent data starvation during training. In addition to model parallelism and data engineering, you will also develop proficiency in profiling distributed systems to detect load imbalances and communication overhead, ensuring that your AI Engineering solutions are truly optimised for performance. Finally, this module explores ML model serving and inference frameworks to take large-scale models from training environment to end users.

Topics Covered:

• Model Parallelism Strategies, Training Large-Scale Models, Memory Optimisation • High-Performance Data Processing, Parallel I/O, Data Loading, Distributed Processing, Data Versioning • Performance Optimisation and Profiling, Communication Optimisation • ML Model Serving Frameworks, Inference Frameworks, and Optimisation Techniques

Module 4: Enterprise MLOps and Cloud Infrastructure Strategy

This module focuses on the "Operations" of AI, moving from model training to production-grade deployment. You will learn to build continuous integration and delivery (CI/CD) pipelines tailored for AI, incorporating automated testing for both code and data, deployment and experiment tracking. Further, the module covers production monitoring and observability, teaching you how to track model performance and detect data drift in real-time. Finally, you will explore cloud-HPC integration, learning how to provision hybrid infrastructures that combine on-premise clusters with third-party cloud capabilities to ensure your MLOps strategy is flexible. The module ends with an overview of advanced topics and emerging AI Engineering trends.

Topics Covered:

• Continuous Integration / Continuous Deployment (CI/CD) and MLOps for AI and ML Workloads, Experiment Tracking, Deployment Automation • Production Monitoring, Model Performance, Drift Detection, Distributed Observability • Cloud–HPC Integration, Cloud ML Platforms, Hybrid Architecture, Cloud Storage, Infrastructure as Code • Emerging Trends and Advanced AI Engineering, Federated Learning, AutoML, Energy-Efficient AI

Languages and Tools covered

Build hands-on expertise with tools and frameworks used across AI systems

OpenMPI
MPICH
Intel MPI
OpenMP 4.5+
Docker
TensorFlow
PyTorch
NVIDIA Nsight
Kubernetes
Intel VTune
NVIDIA Container Toolkit
netCDF
Slurm
PBS
TAU
GitHub Actions
Git
CMake
GCC
oneAPI

Learn from IIT Bombay faculty

Learn from leading IIT Bombay faculty who blends rigorous AI and strategy research with practical delivery.

Prof. Shivasubramanian Gopalakrishnan

Associate Professor Department of Mechanical Engineering, IIT Bombay Ph.D. | University of Massachusetts - Amherst

Research in CFD, emerging AI/ML methodologies & scientific computing

Expertise in scalable HPC systems & AI models for complex simulations

Know More

Course Fees

Invest in your career

Build, scale and manage AI systems with advanced engineering and MLOps skills
Earn a Certificate of Completion from IIT Bombay
Campus Immersion at IIT Bombay Campus
Hands-on projects to ensure learners apply technical concepts to solve practical infrastructure problems

Take the next step

00 : 00 : 00

Apply to the course now or schedule a call with our advisors

Get started with your application

Application closes: 25th May 2026

Talk to our advisor for further course details

Registration Process

Registrations close once the required number of participants enroll. Apply early to secure your spot.

Application

Interested candidates can apply by filling out a simple online application form.
Interview Process

Go through a mandatory screening call with the registration office
Offer of Registration

Selected candidates will get an offer letter and must pay the fee to confirm registration

Eligibility Criteria

Educational Background: Bachelor’s or Master’s degree in Engineering from a recognised university with a minimum aggregate of 50% (or equivalent CGPA).
Professional Experience: Minimum of 2 years of relevant professional work experience.
Technical Skills: Prior exposure to Python or C/C++ programming, familiarity with the Linux command line and bash scripting, and experience with version control (Git).

Limited seats available