ALLOCATION OF THE LARGE CLUSTER SETUPS IN MAPREDUCE

Daggupati Saidamma

Abstract


Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce instances, a mechanism for dynamic resource (re-)allocations to those instances is required. In this paper, we present such a mechanism called Fawkes that attempts to balance the allocations to MapReduce instances so that they experience similar service levels. Fawkes proposes a new abstraction for deploying MapReduce instances on physical resources, the MR-cluster, which represents a set of resources that can grow and shrink, and that has a core on which MapReduce is installed with the usual data locality assumptions but that relaxes those assumptions for nodes outside the core. Fawkes dynamically grows and shrinks the active MRcluster based on a family of weighting policies with weights derived from monitoring their operation. Implementing MapReduce in cloud requires creation of clusters, where the Map and Reduce operations can be performed. Optimizing the overall resource utilization without compromising with the efficiency of availing services is the need for the hour. Selecting right set of nodes to form cluster plays a major role in improving the performance of the cloud. As a huge amount of data transfer takes place during the data analysis phase, network latency becomes the defining factor in improving the QoS of the cloud. In this paper we propose a novel Cluster Configuration algorithm that selects optimal nodes in a dynamic cloud environment to configure a cluster for running MapReduce jobs. The algorithm is cost optimized, adheres to global resource utilization and provides high performance to the clients. The proposed Algorithm gives a performance benefit of 35% on all reconfiguration based cases and 45 % performance benefit on best cases.


Keywords


Mapreduce; Cloud Computing; Hadoop; Distributed Computing;

References


B. Igou “User Survey Analysis: Cloud-Computing Budgets Are Growing and Shifting; Traditional IT Services Providers Must Prepare or Perish”. Gartner Report, 2010

http://en.wikipedia.org/wiki/Loop_device

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha and E. Harris. Reining in the Outliers inMap-Reduce Clusters using Mantri. In OSDI, 2010.

http://en.wikipedia.org/wiki/Big-data

S. Babu. Towards Automatic Optimization of MapReduce Programs. In SOCC, 2010.

http://en.wikipedia.org/wiki/Clickstream

K. Kambatla, A. Pathak and H. Pucha. Towards Optimizing Hadoop Provisioning in the Cloud. In HotCloud, 2009.

Cloudera. http://www.cloudera.com/blog/2010/08/hadoopfor-fraud-detection-and-prevention/

K. Morton, A. Friesen, M. Balazinska, D. Grossman. Estimating the Progress of MapReduce Pipelines. In ICDE, 2010.

S. Ghemawat, H. Gobioff and S. Leung, “The Google File System,” ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43, 2003.

J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters” ACM Commun., vol. 51, pp. 107–113, 2008.

J. Li, P. Roy, S. U. Khan, L. Wang and Y. Bai, “Data mining using clouds: an experimental implementation of apriori over mapreduce,” http://sameekhan.org/pub/L−K−2012−SCALCOM.pdf

X. Lin, “MR-Apriori: association rules algorithm based on mapreduce,” IEEE, 2014.

X. Y. Yang, Z. Liu and Y. Fu, “MapReduce as a programming model for association rules algorithm on hadoop,” in Proceedings 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), 2010, vol. 99, no. 102, pp. 23–25.

N. Li, L. Zeng, Q. He and Z. Shi, “Parallel implementation of apriori algorithm based on mapreduce,” in Proceedings 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing, IEEE, 2012, pp. 236–241.

S. Oruganti, Q. Ding and N. Tabrizi, “Exploring Hadoop as a platform for distributed association rule mining,” in FUTURE COMPUTING 2013 the Fifth International Conference on Future Computational Technologies and Applications, pp. 62–67.

M-Y. Lin, P-Y. Lee and S-C. Hsueh, “Apriori-based frequent itemset mining algorithms on mapreduce,” in Proceedings 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC ’12), ACM, New York, 2012, Article 76.

F. Kovacs and J. Illes, “Frequent itemset mining on Hadoop,” in Proceedings IEEE 9th International Conference on Computational Cybernetics (ICCC), Hungry, 2013, pp. 241–245.

L. Li and M. Zhang, “The strategy of mining association rule based on cloud computing,” in Proceedings IEEE International Conference on Business Computing and Global Informatization (BCGIN), 2011, pp. 29–31.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2018, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.