IJCS Journal | International journal of Computer Science

RECENT SURVEY OF BIG DATA ANALYTICS FOR MAPREDUCE FREQUENT ITEM MINING

Sri Vasavi College, Erode Self-Finance Wing, 3rd February 2017. National Conference on Computer and Communication, NCCC’17. International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format

Download

Abstract

Frequent Itemset Mining (FIM) is one of the most well known techniques to extract knowledge from data process. The combinatorial explosion of FIM methods become even more problematic when they are applied to Big Data. Fortunately, present improvements in the field of parallel programming already provide good tools to tackle this problem. However, these tools come with their own technical challenges, e.g. balanced data distribution and inter-communication costs. In this paper, we analysis the applicability of FIM techniques on the MapReduce platform. In this paper propose a Confabulation Base Parallel FIM approach called CBP-FIM-DP using the MapReduce programming model. The above mentioned FIM mining algorithms extract from and analyze the historical datasets for decision making. The purpose of Big data mining is to go beyond the usual request-response processing, market basket analysis or uncovering some hidden relationships and implement very large scale parallel data mining algorithm. Comparing with the results derived from mining the conventional datasets, unveiling the huge volume of interconnected heterogeneous big data has the potential to maximize our knowledge in the target domain. In our experiments we show the scalability of our methods.

References

[1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB, pages 487–499, 1994.

[2] R. J. Bayardo, Jr. Efficiently mining long patterns from databases. SIGMOD Rec. , pages 85–93, 1998.

[3] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithms for discovery of association rules. Data Min. and Knowl. Disc. , pages 343–373, 1997.

[4] R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. , pages 962–969, 1996

[5] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proc. WIDM, pages 9–15. ACM, 2001.

[6] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. In Proc. HPDC, pages 810–818. ACM, 2010.

[7] G. A. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley, 2000.

[8] Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In F. Provost and R. Srikant, editors, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 401 –406. ACM Press, 2001.

[9] M. J. Zaki, “Parallel and distributed association mining: A survey,” Concurrency, IEEE, vol. 7, no. 4, pp. 14–25, 1999.

[10] I. Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure based generalized association rule mining,” in Proceedings of the 9th ACMSIGMOD workshop on Research issues in data mining and knowledge discovery. ACM, 2004, pp. 60–63

[11] M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh, “Apriori-based frequent itemset mining algorithms on mapreduce,” in Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ser. ICUIMC ’12. New York, NY, USA: ACM, 2012, pp. 76:1–76:8.

[12] X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,” in Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on. IEEE, 2014, pp. 141–144.

[13] L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balanced parallel fp-growth with mapreduce,” in Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on. IEEE, 2010, pp. 243–246.

[14] S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study of improved fp-growth algorithm in mapreduce,” in 1st InternationalWorkshop on Cloud Computing and Information Security. Atlantis Press,2013.

[15] M. Riondato, J. A. DeBrabant, R. Fonseca, and E. Upfal, “Parma: a parallel randomized algorithm for approximate association rulesmining in mapreduce,” in Proceedings of the 21st ACM internationalconference on Information and knowledge management. ACM, 2012, pp.85–94.

[16] C. Lam, Hadoop in action. Manning Publications Co., 2010

[17] H. Li, Y. Wang, D. Zhang, M. Zhang, and E. Y. Chang, “Pfp: parallel fp-growth for query recommendation,” in Proceedings of the 2008 ACMconference on Recommender systems. ACM, 2008, pp. 107–114.

[18] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,“Spark: cluster computing with working sets,” in Proceedings of the2nd USENIX conference on Hot topics in cloud computing, vol. 10, 2010, p. 10.

Keywords

Hadoop, Frequent Item Mining, MapReduce, Parallel Algorithm, CBP-FIM

Book Details

RECENT SURVEY OF BIG DATA ANALYTICS FOR MAPREDUCE FREQUENT ITEM MINING

Download this PDF format

Abstract

References

Keywords