In recent years ad hoc parallel data processing has emerged to be one of the killers for Infrastructure-as-a-Service (IaaS) clouds. Real Cloud computing companies have started to integrate systems for parallel data processing in their item portfolio, making it simple for clients to get to these services and to deploy their programs. However, the processing structures which are right now utilized have been intended for static, homogeneous cluster setups and disregard the particular nature of a cloud. Therefore, the allocate compute resources might be insufficient for big parts of the submitted job and unnecessarily increasing processing time and cost. In this project, we examine the opportunities and challenges for efficient parallel data processing in clouds and present our exploration project Nephele. Nephele is the principal information handling structure to unequivocally misuse the dynamic resource portion offered by the present IaaS clouds for both, task scheduling and execution. Specific tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. In light of this new structure, we perform broadened assessments of MapReduce-roused preparing occupations on an IaaS cloud framework and compare the results with the popular data processing system Hadoop.