IJCS Journal | International journal of Computer Science

A Survey – Methods of Missing Data Imputation

Sri Vasavi College, Erode Self-Finance Wing, 3rd February 2017. National Conference on Computer and Communication, NCCC’17. International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format

Download

Abstract

Missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation This paper reviews methods for handling missing data in a research study

References

1. Acuna E, Rodriguez C (2004) Classification, clustering and data mining applications. Springer, Berlin, pp 639–648

2. Alcalá-fdez J, Sánchez L, García S, Jesus MJD, Ventura S, Garrell JM, Otero J, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

3. Asuncion A, Newman D (2007) UCI machine learning repository. https://archive.ics.uci.edu/ml/

4. Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11:11–73

5. Barnard J, Meng X (1999) Applications of multiple imputation in medical studies: From aids to nhanes. Stat Methods Med Res 8(1):17–36

6. Batista G, Monard M (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5):519–533

7. Bezdek J, Kuncheva L (2001) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473

8. Broomhead D, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 11:321–355

9. Clark P, Niblett T (1989) The cn2 induction algorithm. Mach Learn J 3(4):261–283

10. Cohen W (1995) Fast effective rule induction. In: Machine learning: proceedings of the twelfth interna- tional conference, pp 1–10

11. Cohen W, Singer Y (1999) A simple and fast and and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence, pp 335–342

12. Cover TM, Thomas JA (1991) Elements of information theory, 2nd edn. Wiley, NY

13. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

14. Therese D. Pigott Loyola University Chicago, Wilmette, IL, USA-A Review of methods of Missing Data

16. Ennett CM, Frize M, Walker CR (2001) Influence of missing values on artificial neural network perfor- mance. Stud Health Technol Inform 84:449–453

17. Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918

18. Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A 37(5):692–709

19. Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recognit 41(12):3692–3705

20. Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of 13th international joint conference on uncertainly in artificial intelligence (IJCAI93), pp. 1022–1029

21. Feng H, Guoshun C, Cheng Y, Yang B, Chen Y (2005) A svm regression based approach to filling in missing values. In: Khosla R, Howlett RJ, Jain LC (eds) „KES (3)?, vol 3683 of lecture notes in computer science. Springer, Berlin, pp 581–587

22. Frank E, Witten I (1998) Generating accurate rule sets without global optimization. In: Proceedings of the fifteenth international conference on machine learning, pp 144–151

23. García-Laencina P, Sancho-Gómez J, Figueiras-Vidal A (2009) Pattern classification with missing data: a review. Neural Comput Appl. 9(1):1–12

24. García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

25. Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing In Press, Corrected Proof

26. Grzymala-Busse J, Goodwin L, Grzymala-Busse W, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: Proceedings of 10th international conference of rough sets and fuzzy sets and data mining and granular computing(RSFDGrC), pp 342–351

27. Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Ziarko W, Yao YY (eds) Rough sets and current trends in computing, vol 2005 of lecture notes in computer science, Springer, pp 378–385

28. Hruschka ER Jr., Hruschka ER, Ebecken NF (2007) Bayesian networks for imputation in classification problems. J Intell Inf Syst 29(3):231–252

29. Kim H, Golub GH, Park H (2005) Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198

30. Kwak N, Choi C-H (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

31. Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

32. Cessie S le, van Houwelingen J (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201

33. Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: Proceedings of 4th international conference of rough sets and current trends in computing (RSCTC), pp 573–579

34. Little RJA, Rubin DB (1987) Statistical analysis with missing data, wiley series in probability and statis- tics, 1st edn. Wiley, New York

35. Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: the good synergy between RBFNs and EventCovering method. Neural Netw 23(3):406–418

36. Matsubara ET, Prati RC, Batista GEAPA, Monard MC (2008) Missing value imputation using a semi- supervised rank aggregation approach. In: Zaverucha G, da Costa ACPL (eds) „SBIA?, vol 5249 of lecture notes in computer science. Springer, Berlin, pp 217–226

37. McLachlan G (2004) Discriminant analysis and statistical pattern recognition. Wiley, NY

38. Merlin P, Sorjamaa A, Maillet B, Lendasse A (2010) X-SOM and L-SOM: a double classification approach for missing value imputation. Neurocomputing 73(7–9):1103–1108

39. Michalksi R, Mozetic I, Lavrac N (1986) The multipurpose incremental learning system aq15 and its testing application to three medical domains. In: Proceedings of 5th international conference on artificial intelligence (AAAI), pp 1041–1045

40. Moller F (1990) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6:525–533

41. Nogueira BM, Santos TRA, Zárate LE (2007) Comparison of classifiers efficiency on missing values recovering: application in a marketing database with massive missing data. In: „CIDM?, IEEE, pp 66–72

42. Oba S, aki Sato M, Takemasa I, Monden M, ichi Matsubara K, Ishii S (2003) A bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096

43. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max- dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

44. Pham DT, Afify AA (2005) Rules-6: a simple rule induction algorithm for supporting decision making. In: Industrial electronics society, 2005. IECON 2005. 31st annual conference of IEEE, pp 2184–2189

45. Pham DT, Afify AA (2006) Sri: A scalable rule induction algorithm. Proc Inst Mech Eng Part C J Mech Eng Sci 220:537–552

46. Plat J (1991) A resource allocating network for function interpolation. Neural Comput 3(2):213–225

47. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208

48. Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, Los Altos

49. Qin B, Xia Y, Prabhakar S (2010) Rule induction for uncertain data. Knowl Inf Syst, doi:10.1007/ s10115-010-0335-7, pp 1–28 (in press)

50. Quinlan J (1993) C4.5:programs for machine learning. Morgan Kauffman, Los Altos

51. Reddy C, Park J-H (2010) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst, doi:10.1007/s10115-010-0358-0, pp 1–22, (in press)

52. Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Learn Res 8:1623–1657

53. Safarinejadian B, Menhaj M, Karrari M (2010) A distributed EM algorithm to estimate the parameters of a finite mixture of components. Knowl Inf Syst 23(3):267–292

54. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London

55. Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matri- ces and imputation of missing values. J Clim 14:853–871

56. Song Q, Shepperd M, Chen X, Liu J (2008) Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation. J Syst Softw 81(12):2361–2370

57. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525

58. Twala B (2009) An empirical comparison of techniques for handling incomplete data using decision trees.

Book Details

A Survey – Methods of Missing Data Imputation

Download this PDF format

Abstract

References

Keywords