WORKING WITH HIGH DIMENSIONAL DATA ARRANGEMENT USING FEATURE SELECTION

Sivakoti Taraka Satya Phanindra, Dr. R. China Appala Naidu

Abstract


This paper suggested a pace Q-statistic that evaluates the performance inside the FS formula. Q-statistic 's the reason the steadiness of selected feature subset combined with conjecture precision. The paper suggested Booster to enhance the performance inside the existing FS formula. However, introduced on by an FS formula when using the conjecture precision will probably be unstable within the variations within the training set, particularly in high dimensional data. This paper proposes a completely new evaluation measure Q-statistic that's incorporated while using the steadiness within the selected feature subset furthermore for the conjecture precision. Then, we advise the Booster inside the FS formula that reinforces the benefits of the Q-statistic within the formula applied. A considerable intrinsic trouble with forward selection is, however, a switch within the decision within the initial feature can lead to an entirely different feature subset therefore the soundness within the selected volume of features can be quite low even though the selection may yield high precision. This paper proposes Q-statistic to judge the performance inside the FS formula obtaining a classifier. This is often frequently a hybrid approach to calculating the conjecture precision within the classifier combined with stability within the selected features. The MI estimation with record data involves density estimation of high dimensional data. Although much researches are really done on multivariate density estimation, high dimensional density estimation with small sample dimension remains a formidable task. Your paper proposes Booster on selecting feature subset within the given FS formula.


Keywords


Booster; Feature Selection; Q-Statistic; FS Algorithm; High Dimensional Data;

References


HyunJi Kim, Byong Su Choi, and Moon Yul Huh, “Booster in High DimensionalData Classification”,ieee transactions on knowledge and data engineering, vol. 28, no. 1, january 2016.

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring,” Am. Assoc. Advancement Sci., vol. 286, no. 5439, pp. 531–537, 1999.

Q. Hu, L. Zhang, D. Zhang, W. Pan, S. An, and W. Pedrycz, “Measuring relevance between discrete and continuous features based on neighborhood mutual information,” Expert Syst. With Appl., vol. 38, no. 9, pp. 10737–10750, 2011.

G. Brown, A. Pocock, M. J. Zhao, and M. Lujan, “Conditional likelihood maximization: A unifying framework for information theoretic feature selection,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 27–66, 2012.

H. Liu, J. Li, and L.Wong, “A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns,” Genome Informatics Series, vol. 13, pp. 51–60, 2002.

J. Stefanowski, “An experimental study of methods combining multiple classifiers-diversified both by feature selection and bootstrap sampling,” Issues Representation Process. Uncertain Imprecise Inf., Akademicka OficynaWydawnicza, Warszawa, pp. 337–354, 2005.

S. A. Sajan, J. L. Rubenstein, M. E. Warchol, and M. Lovett, “Identification of direct downstream targets of Dlx5 during early inner ear development,” Human Molecular Genetics, vol. 20, no. 7, pp. 1262–1273, 2011.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2020, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.