Univariate and multivariate imputation methods evaluation for reconstructing climate time series data: A case study of Mosul station-Iraq
DOI:
https://doi.org/10.54386/jam.v26i3.2657Keywords:
Missing Data, Data Reconstruction, Climate Change, MNAR, MAR, MCARAbstract
Comprehensive climate time series data is indispensable for monitoring the impacts of climate change. However, observational datasets often suffer from data gaps within their time series, necessitating imputation to ensure dataset integrity for further analysis. This study evaluated six univariate and multivariate imputation methods to infill missing values. These methods were applied to complete the subsets of time series data for precipitation, temperature, and relative humidity from Mosul station spanning 1980–2013. Artificial gaps of 5%, 10%, 20%, and 30% missing observations were introduced under scenarios of missing completely at random (MCAR) missing at random (MAR), and missing not at random (MNAR). Evaluation metrics including RMSE and Kling-Gupta Efficiency were utilized for performance evaluation. Results revealed that seasonal decomposition was the most effective univariate imputation method across all variables. For the multivariate imputation, kNN demonstrated superior performance in infilling the precipitation missing data under MCAR, while norm.predict exhibited optimal performance in the temperature missing data under all missing scenarios. Moreover, missForest was identified as the most suitable method for infilling missing relative humidity data. This study's methodology offers insights into selecting appropriate imputation methods for other climate stations, thereby enhancing the accuracy of the climate change effects analysis.
References
Afrifa-Yamoah, E., Mueller, U. A., Taylor, S. M., and Fisher, A. J. (2020). Missing data imputation of high-resolution temporal climate time series data. Meteorol. Applications, 27(1): 1–18. https://doi.org/10.1002/met.1873
Chaudhry, A., Li, W., Basri, A., and Patenaude, F. (2019). A Method for Improving Imputation and Prediction Accuracy of Highly Seasonal Univariate Data with Large Periods of Missingness. Wireless Commun. Mobile Comput., 2019. https://doi.org/10.1155/2019/4039758
Diouf, S., Dème, E. H., and Dème, A. (2022). Imputation methods for missing values: the case of Senegalese meteorological data. African J. Applied Statist., 9(1): 1245–1278. https://doi.org/10.16929/ajas/2022.1245.267
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A survey on missing data in machine learning. J. Big Data, 8(1):140. https://doi.org/10.1186/s40537-021-00516-9
Jadhav, A., Pramod, D., and Ramanathan, K. (2019). Comparison of Performance of Data Imputation Methods for Numeric Dataset. Applied Artificial Intel., 33(10): 913–933. https://doi.org/10.1080/08839514.2019.1637138
Kowarik, A., and Templ, M. (2016). Imputation with the R package VIM. J. Statist. Software, 74(7). https://doi.org/10.18637/jss.v074.i07
Li, J., Guo, S., Ma, R., He, J., Zhang, X., Rui, D., Ding, Y., Li, Y., Jian, L., Cheng, J., and Guo, H. (2024). Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Medical Res. Methodol., 24(1): 1–9. https://doi.org/10.1186/s12874-024-02173-x
Moritz, S., and Bartz-Beielstein, T. (2017). imputeTS: Time series missing value imputation in R. R Journal, 9(1), 207–218. https://doi.org/10.32614/rj-2017-009
Ou, H., Yao, Y., and He, Y. (2024). Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network. Sensors, 24(4): 1112. https://doi.org/10.3390/s24041112
Saubhagya, S., Tilakaratne, C., Lakraj, P., and Mammadov, M. (2024). A Novel Hybrid Spatiotemporal Missing Value Imputation Approach for Rainfall Data: An Application to the Ratnapura Area, Sri Lanka. Applied Sci., 14(3): 999. https://doi.org/10.3390/app14030999
Sridhara, S., Soumya B. R., and Kashyap, G. R. (2024). Multistage sugarcane yield prediction using machine learning algorithms. J. Agrometeorol., 26(1): 37–44. https://doi.org/10.54386/jam.v26i1.2411
Umar, N., and Gray, A. (2023). Comparing Single and Multiple Imputation Approaches for Missing Values in Univariate and Multivariate Water Level Data. Water (Switzerland), 15(8), 1–21. https://doi.org/10.3390/w15081519
van Buuren, S., and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. J. Statist. Software, 45(3): 1–67. https://doi.org/10.18637/jss.v045.i03
Vidal-Paz, J., Rodríguez-Gómez, B. A., and Orosa, J. A. (2023). A Comparison of Different Methods for Rainfall Imputation: A Galician Case Study. Applied Sci., 13(22): 12260. https://doi.org/10.3390/app132212260
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 KHALID QARAGHULI, M. F. MURSHED, M. AZLIN M. SAID, ALI MOKHTAR, IMAN ROUSTA
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is a human-readable summary of (and not a substitute for) the license. Disclaimer.
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.