Hadoop distributed processing for large sets of data over the cluster of commodity servers and works on different machines at the same time. To process different kinds of data, the client provides data and programs to Hadoop. HDFS stores the data, while Mapreduce processes the data and Yarn, Split the task.