Extensible markup Language approximate query answering Using data mining, intentional based on Tree-Based Association Rules
International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)
Download this PDF format
Abstract
With the increasing popularity of XML for data representations, there is a lot of interest in searching XML data. Due to the structural heterogeneity and textual content’s diversity of XML, it is daunting for users to formulate exact queries and search accurate answers. Therefore, approximate matching is introduced to deal with the difficulty in answering users’ queries, and this matching could be addressed by first relaxing the structure and content of a given query, and then looking for answers that match the relaxed queries. Ranking and returning the most relevant results of a query have become the most popular paradigm in XML query processing. However, the existing proposals do not adequately take structures into account and they therefore lack the strength to elegantly combine structures with contents to answer the relaxed queries. To address this problem, we first propose a sophisticated framework of query relaxations for supporting approximate queries over XML data. The answers underlying this framework are not compelled to strictly satisfy the given query formulation, instead they can be founded on properties inferable from the original query. We then develop a novel top-k retrieval approach which can smartly generate the most promising answers in an order correlated with the ranking measure
References
[1]Arasu, A. and Garcia-Molina, H. Extracting Structured Data from Web Pages.In Proc.of the ACM SIGMOD Int. Conf. on Management of Data, 2003.
[2] Baumgartner, R., Flesca, S. and Gottlob, G. Visual Web Information Extraction with Lixto. In Proc. of Very Large DataBases (VLDB), 2001.
[3]Chakrabarti, S. Mining the Web: Discovering Knowledge from Hypertext Data. ISBN: - 155860-754-4. Morgan Kaufmann Publishers, 2003.
[4] Chang, C. and Lui, S. IEPAD: Information extraction based on pattern discovery. In Proc. of 2001 Int. World Wide Web Conf., pp. 681–688, 2001.
[5] Crescenzi, V., Mecca, G. and P. Merialdo. ROADRUNNER: Towards automaticdataextraction from large web sites. In Proc. of the 2001 Int. VLDB Conf, pp. 109–118, 2001
[6]Laender, A. H. F., Ribeiro-Neto, B. A., Soares da Silva, A. and Teixeira, J. S. A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31(2), pp 84-93. 2002.
[7]Zhai, Y. and Liu, B. Extracting Web Data Using Instance-Based Learning. In Proc. of Web Information Systems Engineering (WISE), pp. 318-331, 2005.
[8] Zhai, Y. and Liu, B. Structured Data Extraction from the Web Based on Partial Tree Alignment. IEEE Trans. Knowl. Data Eng. 18(12), pp. 1614-1628, 2006.
[9] Shasha, D., Wang, J.T.L, Shan, H., Zhang, K. ATreeGrep: Approximate Searching in Unordered Trees. In 14th International Conference on Scientific and Statistical Database Management. Edinburgh, Scotland, 2002
[10] Soderland, S. Learning to Extract Text-based Information from the World Wide Web, In Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97). 1997.
Keywords
Extensible markup Language (XML),approximate query answering, data mining, intentional information, Tree-Based Association Rules.