ATTRIBUTE AND INFORMATION GAIN BASED FEATURE SELECTION TECHNIQUE FOR CLUSTER ENSEMBLE: HYBRID MAJORITY VOTING BASED VARIABLE IMPORTANCE MEASURE

D. Satya Srinivas, M. Anil Kumar

Abstract


Cluster analysis is one of the prominent unsupervised learning techniques widely used to categorize the data items based on their similarity. Mainly off-line and online analysis through clusters is more attractive area of research. But, high dimensional big data analysis is always introducing a new dimension in the area of data mining. Because high dimensional cluster analysis is giving less accurate results and high processing time when considering maximum dimensions. To overcome these issues dimensionality reduction techniques have been introduced. Here, a million dollar questions are, which dimensions are to be considered? , what type of measures have to be introduced? And how to evaluate the cluster quality based on those dimensions and measures? In this paper, we are trying to propose a novel hybrid technique for dimensionality reduction for better cluster analysis. Proposed algorithm will be completed in polynomial time.


Keywords


Feature Selection; Cluster Ensemble; Information Gain; Attribute Clustering;

References


J.C. Gower, “A General Coefficient of Similarity and Some of Its Properties,” Biometrics, vol. 27, pp. 857-871, 1971.

V. Ganti, J. Gehrke, and R. Ramakrishnan, “CACTUS: Clustering Categorical Data Using Summaries,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999.

Y. Yang, S. Guan, and J. You, “CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 682- 687, 2002.

S. Monti, P. Tamayo, J.P. Mesirov, and T.R. Golub, “Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data,” Machine Learning, vol. 52, nos. 1/2, pp. 91-118, 2003.

X.Z. Fern and C.E. Brodley, “Solving Cluster Ensemble Problems by Bipartite Graph Partitioning,” Proc. Int’l Conf. Machine Learning (ICML), pp. 36-43, 2004.

N. Iam-On, T. Boongoen, and S. Garrett, “Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations,” Proc. Int’l Conf. Discovery Science, pp. 222-233, 2008.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2020, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.