OPTIMIZING LLM PERFORMANCE THROUGH CI/CD PIPELINES IN CLOUD-BASED ENVIRONMENTS

Reena Chandra

doi:10.12732/ijam.v38i2s.77

PDF

Published: Sep 18, 2025

DOI: https://doi.org/10.12732/ijam.v38i2s.77

Keywords:

LLM deployment, CI/CD pipelines, Google Colab, inference benchmarking, MLOps, model reproducibility, cloud-based NLP

Reena Chandra, Kishore Ranjan, Karan Lulla

Abstract

The deployment of large language models (LLMs) in cloud environments presents significant challenges, particularly due to their high computational demands, latency, memory consumption, and the lack of automated and reproducible workflows. The need for efficient, low-cost, reproducible deployment strategies has become critical as LLMs continue to scale and become integral to enterprise and research systems. Traditional manual deployment methods often result in performance instability and hinder operational scalability. To address these issues, this study explores the integration of CI/CD (Continuous Integration/Continuous Deployment) pipelines within Python-based cloud environments as a lightweight alternative for automating model benchmarking and inference tracking. Using the Open LLM Performance Benchmark dataset, which includes metrics such as model size, benchmark scores (e.g., ARC, MMLU, HellaSwag, TruthfulQA), latency, and memory usage, we evaluate a diverse set of public models, including DistilGPT -2, TinyLlama, GPT-Neo-125M, Falcon-rw-1b, and others. All experiments are conducted within Google Colab to simulate low-infrastructure environments. The proposed CI/CD workflow incorporates automated prompt generation, inference execution, latency and memory profiling, and structured logging. Additionally, version control is simulated using DVC-style file hashes and experiment tracking through MLflow. Key findings highlight a clear tradeoff between model size, performance, and cost. Smaller models, such as Tiny-GPT2, demonstrate superior latency but reduced benchmark scores, whereas larger models, like Falcon-rw-1b, yield higher accuracy at the expense of increased memory and inference time. The CI/CD pipeline improved reproducibility, execution traceability, and scalability. These results underscore the potential of lightweight CI/CD frameworks to streamline LLM deployment for teams operating under resource constraints.

Issue

Vol. 38 No. 2s (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details