A framework to Deal with Missing Data in Data Sets
Abstract
Most information systems usually have some missing values due to unavailable data. Missing values minimizing the quality of classification rules generated by a data mining system. Missing vales also affecting the quantity of classification rules achieved by the data mining system. Missing values could influence the coverage percentage and number of reducts generated. Missing values lead to the difficulty of extracting useful information from that data set. Solving the problem of missing data is of a high priority in the field of data mining and knowledge discovery. Replacing missing values by a specific value should not affect the quality of the data. Four different models for dealing with missing data were studied. A framework is established that remove inconsistencies before and after filling the attributes of missing values with the new expected value as generated by one of the four models. Comparative results were discussed and recommendations were concluded.
DOI: https://doi.org/10.3844/jcssp.2006.740.745
Copyright: © 2006 Luai A. Shalabi, Mohannad Najjar and Ahmad A. Kayed. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,506 Views
- 2,893 Downloads
- 13 Citations
Download
Keywords
- Data mining
- missing data
- rules
- reducts
- coverage