Review of Resources Used in Chatbot Models

Akintoba Emmanuel Akinwonmi; Joshua Oluwadamilola Faluyi

Akintoba Emmanuel Akinwonmi
Joshua Oluwadamilola Faluyi

Keywords: Chatbot models, Dataset, Feature extraction, Chatbot resource, Tokenization, Glove, Bag-of-words, Tagging, Machine translation, Intent recognition, Training and optimization, Transfer learning, Corpus

Abstract

This paper provides a concise overview of some elements in chatbot development, focusing on algorithmic approaches, feature extraction techniques, and datasets employed in existing models. The landscape of chatbot design is explored through diverse machine learning and neural network-based approaches. Feature extraction techniques, crucial for capturing relevant information from input data, are scrutinized for their role in enhancing chatbot performance. Additionally, this paper delves into the pivotal aspect of datasets, elucidating their significance in training and evaluating chatbot models. A comprehensive analysis of existing datasets was described as well as their impact on the robustness and adaptability of chatbots across domains. The detailed understanding of these elements is crucial and fundamental to advancing the capabilities of chatbots and ensuring their seamless integration into diverse applications and industries.

References

[1] Abbott, A., & Hrycak, A. (1990). Measuring resemblance in sequence data: An optimal matching analysis of musicians' careers. American journal of sociology, 96(1), 144 185.
[2] Abdulaziz, W., Ameen, M. M., & Ahmed, B. (2019). An overview of bag of words; importance implementation applications and challenges.
[3] Adam, M., Wessel, M., & Benlian, A. (2021). AI-based chatbots in customer service and their effects on user compliance. Electronic Markets, 31(2), 427-445.
[4] Aelani, K., & Gustaman, G. (2021). Chatbot for Information Service of New Student Admission Using Multinomial Naïve Bayes Classification and TF-IDF Weighting. In 2nd International Seminar of Science and Applied Technology (ISSAT 2021) (pp. 115-122). Atlantis Press.
[5] Agarap, A. F. M. (2018). A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In Proceedings of the 2018 10th international conference on machine learning and computing (pp. 26-30).
[6] Anastasiou, D., Ruge, A., Ion, R., Segărceanu, S., Suciu, G., Pedretti, O., & Afkari, H. (2022). A Machine Translation-Powered Chatbot for Public Administration. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation (pp. 329-330).
[7] Ajay H., (2020). Word2Vec, GloVe, and FastText, Explained. https://towardsdatascience.com/word2vec-glove-and-fasttext-explained-215a5cd4c06f
[8] Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., & Bengio, Y. (2016). End to-end attention-based large vocabulary speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4945-4949). IEEE.
[9] Bhoir, S., Ghorpade, T., & Mane, V. (2017). Comparative analysis of different word embedding models. In 2017 International conference on advances in computing, communication and Control (ICAC3) (pp. 1-4). IEEE.
[10] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5, 135-146.
[11] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
[12] Cao, W., Wang, X., Ming, Z., & Gao, J. (2018). A review on neural networks with random weights. Neurocomputing, 275, 278-287.
[13] Chowdhary, K., & Chowdhary, K. R. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649.
[14] Chen, Y., & Luo, Z. (2023). Pre-Trained Joint Model for Intent Classification and Slot Filling Semantic Feature Fusion. Sensors, 23(5), 2848.
[15] Chen, X., & Cardie, C. (2018). Efficient Supervised Self-training of Named Entity Recognizers with Arbitrary Lexicons. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2.
[16] Chen, Q., Gong, Y., Lu, Y., & Tang, J. (2022). Classifying and measuring the service quality of AI chatbot in frontline service. Journal of Business Research, 145, 552-568.
[17] Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Global RDF vector space embeddings. In The Semantic Web–ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I 16 (pp. 190-207).
[18] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
[19] Dharma, E. M., Gaol, F. L., Warnars, H. L. H. S., & Soewito, B. (2022). The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (CNN) text classification. J Theor Appl Inf Technol, 100(2), 31.
[20] Eisenstein, J. (2019). Introduction to natural language processing. MIT press.
[21] Graves, A. (2012). Supervised sequence labelling. In Supervised sequence labelling with recurrent neural networks (pp. 5-13). Springer, berlin, Heidelberg.
[22] Harris, L., & Dennis, C. (2011). Engaging customers on Facebook: Challenges for e‐retailers. Journal of Consumer Behaviour, 10(6), 338-346.
[23] Hassler, M., & Fliedl, G. (2006). Text preparation through extended tokenization. WIT Transactions on Information and Communication Technologies, 37.
[23] Hussain, S., Ameri Sianaki, O., & Ababneh, N. (2019). A survey on conversational agents/chatbots classification and design techniques. In Web, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 33rd International Conference on Advanced Information Networking and Applications (WAINA-2019) 33 (pp. 946-956). Springer International Publishing.
[24] Kecman, V. (2005). Support vector machines–an introduction. In Support vector machines: theory and applications (pp. 1-47). Berlin, Heidelberg: Springer Berlin Heidelberg.
[25] Kumar, P., Sharma, M., Rawat, S., & Choudhury, T. (2018). Designing and developing a chatbot using machine learning. In 2018 International Conference on System Modeling & Advancement in Research Trends (SMART) (pp. 87-91). IEEE.
[26] Kumar, R., & Ali, M. M. (2020). A review on chatbot design and implementation techniques. Int. J. Eng. Technol, 7(11).
[27] Koniew, M. (2020). Classification of the User's Intent Detection in Ecommerce systems- Survey and Recommendations. International Journal of Information Engineering & Electronic Business, 12(6).
[28] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., & Zettlemoyer, L. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
[29] Liu, C. Z., Sheng, Y. X., Wei, Z. Q., & Yang, Y. Q. (2018). Research of text classification based on improved TF-IDF algorithm. In 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE) (pp. 218-222). IEEE.
[30] Martinez, A. R. (2012). Part‐of‐speech tagging. Wiley Interdisciplinary Reviews: Computational Statistics, 4(1), 107-113.
[31] Mars, M. (2022). From word embeddings to pre-trained language models: A state-of-the-art walkthrough. Applied Sciences, 12(17), 8805.
[32] Mondal, A., Dey, M., Das, D., Nagpal, S., & Garda, K. (2018). Chatbot: An automated conversation system for the educational domain. In 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (pp. 1-5). IEEE.
[33] Pham, B. (2020). Parts of speech tagging: Rule-based.
[34] Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2vec model analysis for semantic similarities in English words. Procedia Computer Science, 157, 160-167.
[35] Jinfang, Y (2023). Exploring the Advantages and Limitations of Machine Translation in the Performance of Construction Industry.
[36] Kim, M., Kwak, B. W., Kim, Y., Lee, H. I., Hwang, S. W., & Yeo, J. (2022). Dual Task Framework for Improving Persona-Grounded Dialogue Dataset. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 10, pp. 10912-10920).
[37] Kübler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. In Dependency parsing (pp. 11-20). Cham: Springer International Publishing.
[38] Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia tools and applications, 82(3), 3713-3744.
[39] Li, Y., & Yang, T. (2018). Word embedding for understanding natural language: a survey. Guide to big data applications, 83-104.
[40] Li, J., Sun, A., Han, J., & Li, C. (2020). A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1), 50-70.
[41] Liu, J., Symons, C., & Vatsavai, R. R. (2022). Persona-Based Conversational AI: State of the Art and Challenges. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 993-1001). IEEE.
[42] Mansouri, A., Affendey, L. S., & Mamat, A. (2008). Named entity recognition approaches. International Journal of Computer Science and Network Security, 8(2), 339-344.
[43] Makantasis, K., Doulamis, A., Doulamis, N., & Psychas, K. (2016). Deep learning based human behavior recognition in industrial workflows. In 2016 IEEE International conference on image processing (ICIP) (pp. 1609-1613). IEEE.
[44] Mueller, A., Nicolai, G., McCarthy, A. D., Lewis, D., Wu, W., & Yarowsky, D. (2020). An analysis of massively multilingual neural machine translation for low-resource languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3710-3718).
[45] Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
[46] Onyenwe, I. E., Hepple, M., Chinedu, U., & Ezeani, I. (2019). Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4), 1-26.
[47] Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[48] Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., & Ward, R. (2016). Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 694-707.
[49] Peter, D., Alavi, A., Javadi, B., & Fernandes, S. L. (Eds.). (2020). The Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems. Academic Press.
[50] Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation, 60(5), 503-520.
[51] Ricciardelli, E., & Biswas, D. (2019). Self-improving chatbots based on reinforcement learning. In 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making.
[52] Satu, M. S., & Parvez, M. H. (2015). Review of Integrated Applications with AIML based chatbot. In 2015 International Conference on Computer and Information Engineering (ICCIE) (pp. 87-90). IEEE.
[53] Song, H., Zhang, W. N., Cui, Y., Wang, D., & Liu, T. (2019). Exploiting persona information for diverse generation of conversational responses. arXiv preprint arXiv:1905.12188.
[54] Steigerwald, E., Ramírez-Castañeda, V., Brandt, D. Y., Báldi, A., Shapiro, J. T., Bowker, L., & Tarvin, R. D. (2022). Overcoming language barriers in academia: Machine translation tools and a vision for a multilingual future. BioScience, 72(10), 988-998.
[55] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
[56] Tebenkov, E., & Prokhorov, I. (2021). Machine learning algorithms for teaching AI chat bots. Procedia Computer Science, 190, 735-744.
[57] Toraman, C., Yilmaz, E. H., Şahinuç, F., & Ozcelik, O. (2023). Impact of tokenization on language models: An analysis for Turkish. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(4), 1-21.
[58] Szegedy, C., Toshev, A., & Erhan, D. (2013). Deep neural networks for object detection. Advances in neural information processing systems, 26.
[59] Wei, J., Kim, S., Jung, H., & Kim, Y. H. (2023). Leveraging large language models to power chatbots for collecting user self-reported data. arXiv preprint arXiv:2301.05843.
[60] Cameron, F. (2014). A Simple Markov Chain Chatbot. Retrieved from https://cameron.cf/posts/2014-04-13-Markov%20Chain%20Chatbot.html.
[61] Abdul-Kader, S. A., & Woods, J. C. (2015). Survey on chatbot design techniques in speech conversation systems. International Journal of Advanced Computer Science and Applications, 6(7).
[62] Yin, Z., Chang, K. H., & Zhang, R. (2017). Deepprobe: Information directed sequence understanding and chatbot design via recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2131-2139).
[63] Van Houdt, G., Mosquera, C., & Nápoles, G. (2020). A review on the long short-term memory model. Artificial Intelligence Review, 53(8), 5929-5955.
[64] Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., & Wei, J. (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
[65] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[66] Shewalkar, A. (2019). Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. Journal of Artificial Intelligence and Soft Computing Research, 9(4), 235-245.
[68] Lowe, R., Pow, N., Serban, I., & Pineau, J. (2015). The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909.
[68] Zhang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. arXiv preprint.