Architectural Patterns for Integrating Large Language Models (LLMs) into Node.js Server Applications

Oleksandr Tserkovnyi

Oleksandr Tserkovnyi TrialBase Inc., Principal Engineer Dominican Republic, Punta Cana

Keywords: large language models, Node.js, architectural patterns, microservices, serverless computing

Abstract

This article examines the evolution and systematization of architectural patterns for integrating large language models (LLMs) into server applications built on the Node.js platform, against the backdrop of the rapid diffusion of generative technologies in industrial software development and the expanding market for Retrieval-Augmented Generation (RAG) solutions. The relevance stems from the fact that by 2025, LLMs will have become an indispensable component of digital products, while server architectures must embed computational speech into existing infrastructures under constraints of token budgets, call costs, and network latency. The objective is to identify and analytically describe stable architectural patterns that enable efficient, predictable LLM integration in Node.js backends. Methodologically, the work combines systemic architectural analysis, modeling of interactions with LLM APIs, and content analysis of industrial practices, enabling the author to construct an engineering-economic efficiency model for each configuration. The article’s novelty lies in formulating the concept of a balanced LLM-integration architecture in which throughput, token price, and service-layer observability are treated as interdependent architectural variables. An evolutionary pathway is proposed for transitioning from monolithic model calls to microservice and serverless patterns, informed by market growth dynamics and the scaling of compute resources. The article will benefit researchers and engineers engaged in server-application architectural design, cloud-service developers, and AI-engineering specialists aiming for resilient and cost-balanced deployment of LLM technologies in production environments.

References

Amazon Bedrock. (n.d.). Anthropic Claude Messages API. Amazon Bedrock. Retrieved December 7, 2025, from https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html
Aryan, A., Nain, A. K., McMahon, A., Meyer, L. A., & Sahota, H. S. (2023). The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models. Arxiv. https://doi.org/10.48550/arxiv.2308.08061
AWS. (n.d.). Deploy Python Lambda functions with container images - AWS Lambda. AWS. Retrieved December 8, 2025, from https://docs.aws.amazon.com/lambda/latest/dg/python-image.html
Azra, J. M. A. (2024). Exploring Observability Design Patterns of Microservices: Challenges and Solutions. International Journal for Multidisciplinary Research, 6(2). https://doi.org/10.36948/ijfmr.2024.v06i02.21600
Chondamrongkul, N., & Sun, J. (2023). Software evolutionary architecture: Automated planning for functional changes. Science of Computer Programming, 230, 102978. https://doi.org/10.1016/j.scico.2023.102978
Claude Docs. (n.d.). Streaming Messages. Claude Docs. Retrieved December 5, 2025, from https://platform.claude.com/docs/en/build-with-claude/streaming
Dhaouadi, M., Spencer, K. M. B., Varnum, M. H., Grubb, A. M., & Famelis, M. (2021). Towards a Generic Method for Articulating Design-time Uncertainty. The Journal of Object Technology, 20(3). https://doi.org/10.5381/jot.2021.20.3.a3
Esparza-Peidro, J., Muñoz-Escoí, F. D., & Bernabéu-Aubán, J. M. (2024). Modeling microservice architectures. The Journal of Systems and Software, 213, 112041. https://doi.org/10.1016/j.jss.2024.112041
Farshidi, S., Jansen, S., & van der Werf, J. M. (2020). Capturing software architecture knowledge for pattern-driven design. Journal of Systems and Software, 169, 110714. https://doi.org/10.1016/j.jss.2020.110714
FastAPI. (n.d.). Background Tasks. FastAPI. Retrieved December 8, 2025, from https://fastapi.tiangolo.com/tutorial/background-tasks/
Ghorbian, M., & Ghobaei-Arani, M. (2025). Serverless Computing: Architecture, Concepts, and Applications. ArXiv. https://doi.org/10.48550/arxiv.2501.09831
Gilbert, S., & Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51. https://doi.org/10.1145/564585.564601
GitHub. (n.d.). llama3. GitHub. Retrieved December 9, 2025, from https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
Han, T., Fang, C., Zhao, S., Ma, S., Chen, Z., & Wang, Z. (2024). Token-Budget-Aware LLM Reasoning. ArXiv. https://doi.org/10.48550/arxiv.2412.18547
Klesel, M., & Wittmann, H. F. (2025). Retrieval-Augmented Generation (RAG). Business & Information Systems Engineering, 67, 551–561. https://doi.org/10.1007/s12599-025-00945-3
LangСhain. (2024). LangChain State of AI Agents Report. LangСhain. https://www.langchain.com/stateofaiagents
Legashev, L., Shukhman, A., Badikov, V., & Kurynov, V. (2025). Using Large Language Models for Goal-Oriented Dialogue Systems. Applied Sciences, 15(9), 4687. https://doi.org/10.3390/app15094687
Li, B., Peng, X., Xiang, Q., Wang, H., Xie, T., Sun, J., & Liu, X. (2021). Enjoy your observability: an industrial survey of microservice tracing and analysis. Empirical Software Engineering, 27(1), 5507. https://doi.org/10.1007/s10664-021-10063-9
Lytra, I., Carrillo, C., Capilla, R., & Zdun, U. (2019). Quality attributes use in architecture design decision methods: research and practice. Computing, 102(2), 551–572. https://doi.org/10.1007/s00607-019-00758-9
Microsoft Learn. (2025, August 21). Azure OpenAI in Azure AI Foundry Models Quotas and Limits. Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?tabs=REST
Mordor Intelligence. (2025). Retrieval Augmented Generation Market Size, Share & 2030 Growth Trends Report. Mordor Intelligence. https://www.mordorintelligence.com/industry-reports/retrieval-augmented-generation-market
OpenAI. (2024, July 18). GPT-4o mini: advancing cost-efficient intelligence. OpenAI. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
OpenAI. (2025a). API Pricing. OpenAI. https://openai.com/api/pricing/
OpenAI. (2025b). Introducing GPT-4.1 in the API. OpenAI. https://openai.com/index/gpt-4-1/
Polan, S. (2025). Retrieval-Augmented Generation: Architecture, Techniques, and Evaluations. Journal of Modern Technology and Engineering, 10(1), 42–56. https://doi.org/10.62476/jmte.10142
Söylemez, M., Tekinerdogan, B., & Kolukısa Tarhan, A. (2022). Challenges and Solution Directions of Microservice Architectures: A Systematic Literature Review. Applied Sciences, 12(11), 5507. https://doi.org/10.3390/app12115507
Stackoverflow. (2025). 2025 Stack Overflow Developer Survey. Stackoverflow. https://survey.stackoverflow.co/2025/
TechSci Research. (2024). Microservices Architecture Market By Size, Share, and Forecast 2029. TechSci Research. https://www.techsciresearch.com/report/microservices-architecture-market/25049.html
The Business Research Company. (2025). Microservices Architecture Global Market Report 2025. The Business Research Company. https://www.thebusinessresearchcompany.com/report/microservices-architecture-global-market-report
Toosi, A. N., Javadi, B., Iosup, A., Smirni, E., & Dustdar, S. (2024). Serverless Computing for Next-generation Application Development. Future Generation Computer Systems, 164, 107573. https://doi.org/10.1016/j.future.2024.107573
Wan, Z., Zhang, Y., Xia, X., Yi, J., & Lo, D. (2023). Software Architecture in Practice: Challenges and Opportunities. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3611643.3616367
Waseem, M., Liang, P., Shahin, M., Di Salle, A., & Márquez, G. (2021). Design, monitoring, and testing of microservices systems: The practitioners’ perspective. Journal of Systems and Software, 182, 111061. https://doi.org/10.1016/j.jss.2021.111061