Comparative Evaluation of Zero-Inflated and Hurdle Models for Balanced and Unbalanced Data: Performance Assessment and Model Fit Analysis

Intesar N. El-Saeiti; Gadir Alomair

Intesar N. El-Saeiti
Gadir Alomair

Keywords: Zero-inflated Poisson (ZIP), Hurdle Poisson (HurP), Zero-inflated Negative Binomial (ZINB), Hurdle Negative Binomial (HurNB), Balanced data, Unbalanced data

Abstract

Excessive zeros in count data pose challenges in statistical modeling, particularly in insurance applications. Zero-inflated (ZI) and hurdle models are commonly employed to address this issue by capturing both zero counts and regular counts. While these models share a similar objective, they differ in their treatment of zeros. Zero-inflated models consider zeros as a component of both zero and regular counts, while hurdle models treat zeros separately from non-zero observations. However, limited research exists on the comparative performance of these models, particularly in the presence of missing data. In this study, we assess the performance of four models: zero-inflated Poisson (ZIP), hurdle Poisson (HurP), zero-inflated negative binomial (ZINB), and hurdle negative binomial (HurNB) models, under balanced and unbalanced data conditions. Using an automobile insurance claims dataset, we employ Akaike's information criteria (AIC) and Bayesian information criteria (BIC) as model selection criteria. Our findings indicate that the ZIP model demonstrates the best fit for the claim frequency dataset, both in balanced and unbalanced data scenarios.

References

Aswi, A., Astuti, S. A., & Sudarmin, S. (2022). Evaluating the Performance of Zero-Inflated and Hurdle Poisson Models for Modeling Overdispersion in Count Data. Inferensi: Jurnal Statistika, 5(1), 17-22. doi: 10.12962/j27213862.v5i1.124
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer Science & Business Media.
Feng, C. X. (2021). A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. Journal of Statistical Distributions and Applications, 8(1), 8. doi: 10.1186/S40488-021-00121-4
Lalonde, T. L. (2014). Modeling Correlated Counts with Excess Zeros and Time-Dependent Covariates: A Comparison of ZIP and Hurdle Mixed Models. Joint Statistical Meetings.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models. CRC Press.
Nekesa, F., Odhiambo, C., & Chaba, L. (2019). Comparative assessment of zero-inflated models with application to HIV exposed infants data. Open Journal of Statistics, 9(6), 664-685. doi: 10.4236/OJS.2019.96043
Purnama, D. I. (2021). Comparison of Zero Inflated Poisson (ZIP) Regression, Zero Inflated Negative Binomial Regression (ZINB) and Binomial Negative Hurdle Regression (HNB) to Model Daily Cigarette Consumption Data for Adult Population in Indonesia. Jurnal Matematika, Statistika dan Komputasi, 17(3), 357-369. doi: 10.20956/J.V17I3.12278
SAS Institute Inc. (1998). Solving Business Problems Using SAS Enterprise Miner Software. SAS Institute White Paper, SAS Institute Inc., Cary, NC.
Yip, K. C., & Yau, K. K. (2005). On modeling claim frequency data in general insurance with extra zeros. Insurance: Mathematics and Economics, 36(2), 153-163.
Zhang, P., Pitt, D., & Wu, X. (2022). A comparative analysis of several multivariate zero-inflated and zero-modified models with applications in insurance. arXiv preprint arXiv:2212.00985.