IJCS Journal | International journal of Computer Science

Cluster Based Feature Subsection Algorithm for High-Dimensional Data

International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format

Abstract

The feature subsection is an effective way for removing irrelevant data, reducing dimensionality, improving result comprehensibility, and increasing learning accuracy. Feature Subsection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature Subsection algorithm may be evaluated from both the efficiency and effectiveness points of view. A feature selection algorithm can be evaluated from both the efficiency and effectiveness points of view. While the efficiency apprehensions the time required to find a subsection of features, the effectiveness is related to the quality of the subsection of features. Based on these criteria, a cluster based feature selection algorithm (CFSA) is proposed and experimentally evaluated in this paper. The CFSA algorithm works in two phases. In the first phase, features are divided into clusters by using graph-theoretic clustering methods. In the second phase, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in dissimilar clusters are relatively independent; the cluster based strategy of CFSA has a high probability of producing a subset of useful and independent features. To ensure the efficiency of CFSA, we implement the efficient minimum-spanning tree (MST) clustering method. Efficiency and effectiveness of the CFSA algorithm are evaluated through an experimental study.

References

[1] S. M. Metev and V. P. Veiko, Laser Assisted Microtechnology, 2nd ed., R. M. Osgood, Jr., Ed. Berlin, Germany: Springer-Verlag, 1998.

[2] A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data Qinbao Song, Jingjie Ni and Guangtao Wang,2013.

[3] Pereira F., Tishby N. and Lee L., Distributional clustering of English words, In Proceedings of the 31st Annual Meeting on Association For Computational Linguistics, pp 183-190, 1993.

[4] Press W.H., Flannery B.P., Teukolsky S.A. and Vetterling W.T., Numerical recipes in C. Cambridge University Press, Cambridge, 1988.Prim R.C., Shortest connection networks and some generalizations, Bell System Technical Journal, 36, pp 1389-1401.

[5] Quinlan J.R., C4.5: Programs for Machine Learning. San Mateo, Calif: Morgan Kaufman, 1993.

[6] Raman B. and Ioerger T.R., Instance-Based Filter for Feature Selection.

[7] Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th Canadian Conference on AI, pp 38-45, 1992.

[8] Almuallim H. and Dietterich T.G., Learning boolean concepts in the Presence of many irrelevant features, Artificial Intelligence, 69(1-2), pp 279- 305, 1994.

[9] Arauzo-Azofra A., Benitez J.M. and Castro J.L., A feature set measure based on relief, In Proceedings of the fifth international conference on Recent Advances in Soft Computing, pp 104-109, 2004.

[10] Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96-103, 1998.

[11] Battiti R., Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5(4), pp 537- 550, 1994.

[12] Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.

[13] Biesiada J. and Duch W., Features election for high-dimensionaldata?a Pearson redundancy based filter, AdvancesinSoftComputing, 45, pp 242C249, 2008.

[14] Fisher D.H., Xu L. and Zard N., Ordering Effects in Clustering, In Proceedings of the Ninth international Workshop on Machine Learning, pp 162-168, 1992.

[15] Fleuret F., Fast binary feature selection with conditional mutual Information, Journal of Machine Learning Research, 5, pp 1531-1555,2004.

Keywords

Feature Selection, high dimensional data, Clustering methods, Minimal Spanning Tree.

Book Details

Cluster Based Feature Subsection Algorithm for High-Dimensional Data

Download this PDF format

Abstract

References

Keywords