The Design and Construction of a Big Data System for Water Resource Warning and Forecasting in Vietnam
Abstract
The study in this paper introduces the importance of a big data system in water resource forecasting, helping to mitigate the damage caused by natural disasters. With increasing demand for observational data and water resource forecasting information, the paper emphasizes the need to improve forecasting technology to provide accurate and timely information. The paper provides an overview of Big Data technology, including large data volumes, fast processing speeds, data diversity, and reliability. The architecture of the Big Data system at the Center for Water Resource Forecasting and Warning is designed with multiple components such as data sources, storage systems, batch processing, and real-time processing. Technologies like Hadoop and HDFS are applied to manage and store distributed data, ensuring data safety and recovery.
The paper also proposes storage infrastructure and data analysis tools such as Hadoop, Apache Spark, along with security measures like data encryption and access control. Finally, the authors emphasize that the application of Big Data in water resource forecasting in Vietnam requires significant investment in infrastructure, technology, and specialized personnel.
References
Borthakur, D. (2007). The hadoop distributed file system: Architecture and design. Hadoop Project Website.
Cattell, R. (2011). Scalable SQL and NoSQL data stores. Acm Sigmod Record, 39(4), 12-27.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile networks and applications, 19, 171-209.
Cheong, L. K., & Chang, V. (2007). The need for data governance: a case study. ACIS 2007 proceedings, 100.
Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Elmagarmid, A. K., & Bertino, E. (2005). Security in Distributed, Grid, and Pervasive Computing. CRC Press.
Fan, J., McCook, A., & Yen, S. (2015). Sensor Networks for Environmental Monitoring and Water Resource Management. International Journal of Sensor Networks, 17(2), 134-145.
Ferraiolo, D. F., & Kuhn, D. R. (1992). Role-Based Access Controls. 15th National Computer Security Conference.
Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten. Analytics Press.
Furnell, S., & Vasileiou, I. (2017). Security education and awareness: just let them burn? Network Security, 2017(12), 5-9.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International journal of information management, 35(2), 137-144.
Gao, Y., & Liu, J. (2016). Satellite Remote Sensing for Water Resources Monitoring: A Review. Journal of Hydrology, 540, 408-425.
Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles (pp. 29-43).
Gleick, P. H. (2014). Water, drought, climate change, and conflict in Syria. Weather, climate, and society, 6(3), 331-340.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of “big data” on cloud computing: Review and open research issues. Information systems, 47, 98-115.
Islam, M., Huang, A. K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., ... & Abdelnur, A. (2012, May). Oozie: towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD workshop on scalable workflow execution engines and technologies (pp. 1-10).
Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big data and its technical challenges. Communications of the ACM, 57(7), 86-94.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
Katal, A., Wazid, M., & Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. In 2013 Sixth international conference on contemporary computing (IC3) (pp. 404-409). IEEE.
Kimball, R., & Ross, M. (2019). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley.
Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. Sage.
Kreps, J., Narkhede, N., & Rao, J. (2011, June). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (Vol. 11, No. 2011, pp. 1-7).
Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. META group research note, 6(70), 1.
Li, X., Li, Y., Liu, T., Qiu, J., & Wang, F. (2009, September). The method and tool of cost analysis for cloud computing. In 2009 IEEE International Conference on Cloud Computing (pp. 93-100). IEEE.
Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., & Ghalsasi, A. (2011). Cloud computing—The business perspective. Decision support systems, 51(1), 176-189.
Mayer-Schönberger, V. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.
Muenchen, R. A. (2011). R for SAS and SPSS users. Springer Science & Business Media.
Neumeyer, L., Robbins, B., Nair, A., & Kesari, A. (2010, December). S4: Distributed stream computing platform. In 2010 IEEE International Conference on Data Mining Workshops (pp. 170-177). IEEE.
Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1-10). IEEE.
Stallings, W. (2006). Cryptography and network security, 4/E. Pearson Education India.
Subashini, S., & Kavitha, V. (2011). A survey on security issues in service delivery models of cloud computing. Journal of network and computer applications, 34(1), 1-11.
Vermesan, O., & Friess, P. (2013). Internet of things: converging technologies for smart environments and integrated ecosystems. River publishers.
Vorosmarty, C. J., Green, P., Salisbury, J., & Lammers, R. B. (2000). Global water resources: vulnerability from climate change and population growth. Science, 289(5477), 284-288.
White, T. (2012). Hadoop: The definitive guide. O'Reilly Media, Inc.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., ... & Stoica, I. (2012). Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In 9th USENIX symposium on networked systems design and implementation (NSDI 12) (pp. 15-28).
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. In 2nd USENIX workshop on hot topics in cloud computing (HotCloud 10).
Zedler, J. B., & Kercher, S. (2005). Wetland resources: status, trends, ecosystem services, and restorability. Annu. Rev. Environ. Resour., 30(1), 39-74.
Zikopoulos, P., Eaton, C., deRoos, D., Detusch, T., & Lapis, G. (2012). Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data (IBM.). New York: McGraw. In.
Zissis, D., & Lekkas, D. (2012). Addressing cloud computing security issues. Future Generation computer systems, 28(3), 583-592.
Copyright (c) 2024 Nguyen Van Loi, Le Anh Tuan, Tran Duc Thinh, Dang Tran Trung
This work is licensed under a Creative Commons Attribution 4.0 International License.