An Improved Feature Subset Selection with Correlation Measures using Clustering Technique
International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC).
Download this PDF format
Clustering techniques are used to partition the transaction data values. Vector based similarity models are suitable for low dimensional data values. High dimensional data values are clustered using subspace clustering method. Feature selection involves identifying a subset of the most useful features that produces compatible results as the original set of features. In this paper the clustering technique are used to improve the feature subset selection with correlation measure. Based on these criteria, a fast clustering based feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. A feature selection algorithm is constructed with the consideration of efficiency and effectiveness factor. The efficiency concerns the time required to find a subset of features. The effectiveness is related to the quality of the subset of features. Fast clustering based feature selection algorithm (FAST) is used to cluster the high dimensional data. The feature selection process is improved with correlation measures. Redundant feature filtering mechanism is used to filter the similar features and also the custom threshold is used to improve the clustering accuracy. Index Terms—Component, formatting, style, styling, insert.
 M.A. Hall.. “Correlation-Based Feature Selection for Discrete and Numeric class Machine Learning”, In Proceeding of 17th International Conference on Machine Learning, pp 359-366,2000.
 D.A. Bell and H. Wang., “Formalism for Relevance and its Application in Feature Subset Selection,“ Machine Learning, 41(2), pp 175-195,2000.
 M. Dash, H. Liu and H. Motoda ., “Consistency based Feature Selection”, In Proceeding of the fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp 98-109,2000.
 S. Das., “Filters, Wrappers And A Boosting Based Hybrid For Feature Selection”, In Proceeding Of The 18th International Conference On Machine Learning, pp 74-81, 2001.
 I.S Dhillon., S .Mallela. and R .Kumar., “A Divise Information Theoretic Feature Clustering Algorithm For Text Classification,” J. Mach. Learn. Res., 3,, pp 1265-1287, 2003.
 F .Fleuret., “Fast Binary Feature Selection with Conditional Mutual Information, Journal of Machine” Learning Research, 5,Pp 1531-1555, 2004.
 G. Forman., “An Extensive Empirical Study of Feature Selection Metrics for Text Classification”. Journal of Machine Learning Research, 3, pp 1289- 1305, 2003.
 F. Herrera, and S. Garcia., “An Extension on Statistical Comparison of Classifiers over Multiple Datasets for All Pairwise Comparisons”. J.Mach. Learn. Res., 9,pp 2677-2694, 2008.
I. Guyn . and A. Elisseeff ., “An Introduction to Variable and Feature Selection”. Journal of Machine Learning Research, 3 pp 1157-1182, 2003.
 M.A Hall ., “Correlation Based Feature Subset Selection for Machine Learning”, Ph.D Dissertation Waikato, New Zealand: Univ. Waikato,1999.
 C. Krier C. D. Francois., F. Rossi, and M. Verleysen., “Feature Clustering And Mutual Information For The Selection Of Variables In Spectral Data”. In Proc European Symposium On Artificial Neural Networks Advances In Computational Intelligence And Learning, pp 157-162, 2007.
 M. Last., A. Kandel. And O. Mainmom., “Information Theoretic Algorithm For Feature Selection, Pattern Recongnitition Letters”, 22(6-7),pp 799-811,2001.
 L.C Monila, L. Belanche. And A. Nebot., “Feature Selection Algorithms: A Survey and Experimental Evaluation”, In Proc. IEEE Int.Conf. Data Mining. ,pp 306-313, 2002.
 H. Park, and H. Kwon., “Extended Relief Algorithm In Instance Based Feature Filtering”, In Proceeding Of The Sixth International Conference On Advanced Language Processing And Web Information Technology (ALPTI 2007), pp 123-128, 2007.
B. Raman. And T.R Loerger,, “Instance Based Filter for Feature Selection Journal of Machine Learning Research, 1, pp 1-23, 2002.
 M. Robnik-Sikonja. and I. Kononenko., “Theoritic and Empirical Analysis of Relief and Relief”, Machine Learning Research, 53, pp 23-69, 2003.
 P. Scanlon., G. Potamiano., “Mutual information based visual feature selection for lip-reading, in int. conf. on spoken language processing, 2004.
 Scherf M. and Brauer W., Feature Selection By Means Of A Feature Weighting Approach, Technical Report FKI-221-97, Institute Fur Informatics, and Technics Universidad Munched 1997.
 C. Sha, X.Quiu. And A. Zhou., “Feature Selection Based On A New Dependency Measure,” 2008 Fifth International Conference On Fuzzy Systems And Knowledge Discovery, 1, pp 266-270, 2008.
 J. Souza., “Feature Selection with A General Hybrid Algorithm, Ph.D, University Of Ottawa, Ottawa, Ontario, Canada, 2004.
 G. Van Dijk. and M.M Van Hulle .,” Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis, “International Conference on Artificial Neural Networks, 2006.
 G.I Webb., “Multi boosting for Combining Boosting and Wagging, Machine Learning, 40(2), pp 159-196, 2000.
 E. Xing., M. Jordan. And R. Karp, “Feature Selection for High Dimensional Genomic Microarray Data, “In Proceedings of the Eighteenth International Conference on Machine Learning, pp 601-608, 2008.
 J. Yu., S.S.R. Adipi. And P.H Artes., “A Hybrid Feature Selection Strategy For Image Defining Features: “Towards Interpretation Of Optic Nerve Images, In. Proceedings Of 2005 International Conference On Machine Learning And Cybernetics, 8, pp 5127-5132, 2005.
 L. Yu. And H. Liu H., “Feature Selection For High Dimensional Data: Fast Correlation Based Filter Solution, “In Proceedings of 20th International Conferences On Machine Learning, 20(2)
Feature clustering, Feature subset selection, redundant filtering, filter method.