CLUSTERING HIGH-DIMENSIONAL DATA USING HUBNESS-BASED APPROACHES
International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)
Download this PDF format
Clustering is an unsupervised process of grouping elements together, so that elements assigned to the same cluster are more similar to each other than to the remaining data points. High-dimensional data takes place in many fields. Clustering process is because of sparsity, also growing complexity in unique distances between data points. Here capture an original perception on the trouble of clustering high-dimensional data. To neglect the curse of dimensionality by scrutinizing a lower dimensional feature subspace, hold dimensionality by taking advantage of inherently high-dimensional phenomena. Exclusively, using hubness. Validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. The proposed method called “Neighbor clustering”, which takes as input measures of correspondence between pairs of data points. Experimental results demonstrate good performance of our algorithms in multiple settings,
 Kailing, H.-P. Kriegel, and P. Kroger(2004).”Density-Connected Subspace Clustering for High-Dimensional Data,”Proc. Fourth SIAM Int’l Conf. Data Mining (SDM), pp. 246-257
 NenadTomasave,Milo’sRadovanovic,DunjaMladenic,andMirjana Ivanovic(2014),”The Role of hubness in clustering high dimensional data “ IEEE Transactions on knowledge and data engineering VOL. 26.
 C. Aggarwaland P.S. Yu (2000), “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data
 I.S. Dhillon, Y. Guan, and B. Kulis,(2004) “Kernel k-Means: Spectral Clustering and Normalized Cuts,” Proc. 10th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 551-556
 Kaba´n, “Non-Parametric Detection of Meaningless Distances in High Dimensional Data,” Statistics and Computing, vol. 22, no. 2, pp. 375-385, 2012
 Han and M. Kamber, Data Mining: Concepts and Techniques, second ed. Morgan Kaufmann, 2006. R.J. Durrant and A. Kaba´n, “When Is ‘Nearest Neighbour’ Meaningful: A Converse Theorem and Implications,” J. Complexity, vol. 25, no. 4, pp. 385-397.
 E. Agirre, D. Mart?´nez, O.L. de Lacalle, and A. Soroa, “Two Graph-Based Algorithms for State-of-the-Art WSD,” Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 585-593, 2006
 Ning, H. Ng, S. Srihari, H. Leong, and A. Nesvizhskii, “Examination of the Relationship between Essential Genes in PPI Network and Hub Proteins in Reverse Nearest Neighbor Topology,” BMC Bioinformatics, vol. 11, pp. 1-14, 2010.
Clustering, curse of dimensionality, nearest neighbors, hubs