IJCS Journal | International journal of Computer Science

FRAMEWORK FOR LOW- HIGH INTRA CLUSTERING MEASURING COMMON WEIGHTED SIMILARITIES

International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format

Abstract

Distance functions like Euclidian, Manhattan etc. are the common traditions to measure the similarities between numeric values. Various text similarity techniques like Cosine similarity, Dice similarity etc. are used to measure similarities between text values. But generally an object is consisting of set of attributes of different data types. Clustering is a technique of creating group of similar objects. There are number of techniques available to measure the similarities between the objects. So measuring the similarity between two objects requires the similarity measurement of different data types which requires the combination of similarity measurement techniques. Also some attributes may be more relevant and some attributes may be less relevant for object similarities between the objects for clustering purpose. So similarity weights can be assigned for each pair of attributes between the objects to effectively measure the object similarities. In this paper a framework is proposed to measure the weighted similarities between the objects consist of attributes of different data types. The proposed framework is implemented using the open source technologies and results are also explained with the help of illustrative examples.

References

[1] Cantu-Paz, E., Cheung, S-C., and Kamath, C., "Retrieval of Similar Objects in Simulation Data Using Machine Learning Techniques," Image Processing: Algorithms and Systems III, SPIE Volume 5298, pp 251-258. SPIE Electronic Imaging, San Jose, January 2004. UCRL-JC-153866

[2] Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen,Wei-Ying Ma,Yong Yu , Similarity spreading: a unified framework for similarity calculation of interrelated objects, Pages: 460 - 461 , 2004, ISBN:1-58113-912-8 , International World Wide Web Conference archive, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters table of contents

[3] Maria-Florina Balcan, Avrim Blum, Santosh Vempala, ?A Discriminative Framework for Clustering via Similarity Functions?, Proceedings of the 40th annual ACM symposium on Theory of computing , Victoria, British Columbia, Canada , Pages 671-680 , Year of Publication: 2008

[4] Naresh Kumar Nagwani, Pradeep Singh, "Weight Similarity Measurement Model Based, Object Oriented Approach for Bug Databases Mining to Detect Similar and Duplicate bugs", Proceedings of the International Conference on Advances in Computing, Communication and Control , Mumbai, India , Pages 202-207 ,ICAC 2009.

[5] Ralph Bergmann and Armin Stahl, Similarity Measures for Object-Oriented Case Representations, Proceedings of the 4th European Workshop on Advances in Case-Based Reasoning, Pages: 25 - 36 , Year of Publication: 1998.

[6] R. Cilibrasi, P.M.B. Vitanyi, Similarity of objects and the meaning of words, Proc. 3rd Conf. Theory and Applications of Models of Computation (TAMC), J.-Y. Cai, S. B. Cooper, and A. Li (Eds.), Lecture Notes in Computer Science, Vol. 3959, Springer-Verlag, Berlin, 2006, 21--45.

[7] R.L. Cilibrasi and P.M.B. Vitányi. The Google similarity distance.IEEE Trans. Knowledge and Data Engineering, 19(3):370–383, 2007.Preliminary version: Automatic meaning discovery using Google, https://xxx.lanl.gov/abs/cs.CL/0412098 (2004)

[8] Radoslaw Oldakowski, Christian Bizer, SemMF: A Framework for Calculating Semantic Similarity of Objects Represented as RDF Graphs, In Poster at the 4th International Semantic Web Conference (ISWC 2005), 2005.

[9] Vipul Kashyap Amit Sheth, Semantic and Schematic Similarities between Database Objects: A Context based approach, September , The VLDB Journal — The International Journal on Very Large Data Bases archive, Volume 5 , Issue 4 (December 1996) , Pages: 276 - 304 , Year of Publication: 1996

[10] Java: https://www.java.sun.com

[11] Xapian stemmer: https://xapian.org/docs/stemming.html

[12] Jiawei Han and Micheline Kamber: ?Data Mining: Concepts and Techniques?,2006, ISBN 1-55860-901-6.

Book Details

FRAMEWORK FOR LOW- HIGH INTRA CLUSTERING MEASURING COMMON WEIGHTED SIMILARITIES

Download this PDF format

Abstract

References

Keywords