Univariate and multivariate imputation methods evaluation for reconstructing climate time series data: A case study of Mosul station-Iraq

Authors

  • KHALID QARAGHULI 1- School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, Malaysia. 2- Al-Mussaib Technical Institute, Al-Furat Al-Awsat Technical University, Babil, Iraq.
  • M. F. MURSHED School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, Malaysia
  • M. AZLIN M. SAID School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, Malaysia
  • ALI MOKHTAR Department of Agricultural Engineering, Faculty of Agriculture, Cairo University, Egypt.
  • IMAN ROUSTA Department of Geography, Yazd University, Yazd, Iran

DOI:

https://doi.org/10.54386/jam.v26i3.2657

Keywords:

Missing Data, Data Reconstruction, Climate Change, MNAR, MAR, MCAR

Abstract

Comprehensive climate time series data is indispensable for monitoring the impacts of climate change. However, observational datasets often suffer from data gaps within their time series, necessitating imputation to ensure dataset integrity for further analysis. This study evaluated six univariate and multivariate imputation methods to infill missing values. These methods were applied to complete the subsets of time series data for precipitation, temperature, and relative humidity from Mosul station spanning 1980–2013. Artificial gaps of 5%, 10%, 20%, and 30% missing observations were introduced under scenarios of missing completely at random (MCAR) missing at random (MAR), and missing not at random (MNAR). Evaluation metrics including RMSE and Kling-Gupta Efficiency were utilized for performance evaluation. Results revealed that seasonal decomposition was the most effective univariate imputation method across all variables. For the multivariate imputation, kNN demonstrated superior performance in infilling the precipitation missing data under MCAR, while norm.predict exhibited optimal performance in the temperature missing data under all missing scenarios. Moreover, missForest was identified as the most suitable method for infilling missing relative humidity data. This study's methodology offers insights into selecting appropriate imputation methods for other climate stations, thereby enhancing the accuracy of the climate change effects analysis.

References

Afrifa-Yamoah, E., Mueller, U. A., Taylor, S. M., and Fisher, A. J. (2020). Missing data imputation of high-resolution temporal climate time series data. Meteorol. Applications, 27(1): 1–18. https://doi.org/10.1002/met.1873

Chaudhry, A., Li, W., Basri, A., and Patenaude, F. (2019). A Method for Improving Imputation and Prediction Accuracy of Highly Seasonal Univariate Data with Large Periods of Missingness. Wireless Commun. Mobile Comput., 2019. https://doi.org/10.1155/2019/4039758

Diouf, S., Dème, E. H., and Dème, A. (2022). Imputation methods for missing values: the case of Senegalese meteorological data. African J. Applied Statist., 9(1): 1245–1278. https://doi.org/10.16929/ajas/2022.1245.267

Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A survey on missing data in machine learning. J. Big Data, 8(1):140. https://doi.org/10.1186/s40537-021-00516-9

Jadhav, A., Pramod, D., and Ramanathan, K. (2019). Comparison of Performance of Data Imputation Methods for Numeric Dataset. Applied Artificial Intel., 33(10): 913–933. https://doi.org/10.1080/08839514.2019.1637138

Kowarik, A., and Templ, M. (2016). Imputation with the R package VIM. J. Statist. Software, 74(7). https://doi.org/10.18637/jss.v074.i07

Li, J., Guo, S., Ma, R., He, J., Zhang, X., Rui, D., Ding, Y., Li, Y., Jian, L., Cheng, J., and Guo, H. (2024). Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Medical Res. Methodol., 24(1): 1–9. https://doi.org/10.1186/s12874-024-02173-x

Moritz, S., and Bartz-Beielstein, T. (2017). imputeTS: Time series missing value imputation in R. R Journal, 9(1), 207–218. https://doi.org/10.32614/rj-2017-009

Ou, H., Yao, Y., and He, Y. (2024). Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network. Sensors, 24(4): 1112. https://doi.org/10.3390/s24041112

Saubhagya, S., Tilakaratne, C., Lakraj, P., and Mammadov, M. (2024). A Novel Hybrid Spatiotemporal Missing Value Imputation Approach for Rainfall Data: An Application to the Ratnapura Area, Sri Lanka. Applied Sci., 14(3): 999. https://doi.org/10.3390/app14030999

Sridhara, S., Soumya B. R., and Kashyap, G. R. (2024). Multistage sugarcane yield prediction using machine learning algorithms. J. Agrometeorol., 26(1): 37–44. https://doi.org/10.54386/jam.v26i1.2411

Umar, N., and Gray, A. (2023). Comparing Single and Multiple Imputation Approaches for Missing Values in Univariate and Multivariate Water Level Data. Water (Switzerland), 15(8), 1–21. https://doi.org/10.3390/w15081519

van Buuren, S., and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. J. Statist. Software, 45(3): 1–67. https://doi.org/10.18637/jss.v045.i03

Vidal-Paz, J., Rodríguez-Gómez, B. A., and Orosa, J. A. (2023). A Comparison of Different Methods for Rainfall Imputation: A Galician Case Study. Applied Sci., 13(22): 12260. https://doi.org/10.3390/app132212260

Downloads

Published

01-09-2024

How to Cite

QARAGHULI, K., MURSHED, M. F., M. SAID, M. A., MOKHTAR, A., & ROUSTA, I. (2024). Univariate and multivariate imputation methods evaluation for reconstructing climate time series data: A case study of Mosul station-Iraq. Journal of Agrometeorology, 26(3), 318–323. https://doi.org/10.54386/jam.v26i3.2657