METRIC-DRIVEN PROMPT LIFECYCLE GOVERNANCE FOR ENTERPRISE AI AGENTS: AN ARCHITECTURE FOR MEASURABLE OPTIMIZATION, COMPRESSION, AND COST CONTROL
Main Article Content
Abstract
As enterprises deploy large language model (LLM)-based agentic systems at scale, prompt engineering has transitioned from a design-time activity to a runtime liability. Despite advances in prompt optimization techniques, most organizations lack a metric-driven governance framework to manage prompts throughout their lifecycle—from development and versioning to production deployment, monitoring, and retirement. Uncontrolled prompt evolution leads to hidden cost accumulation, silent performance degradation, compliance drift, and audit blind spots. This paper proposes a Metric-Driven Prompt Lifecycle Governance (MPLG) architecture that integrates quantitative quality gates, cost-telemetry integration, automated regression detection, and version-controlled promotion workflows. Unlike prior work focused on single-stage optimization, MPLG treats prompts as continuously governed artifacts. The architecture comprises four layers: (1) a multi-metric evaluation engine with LLM-as-judge and deterministic scoring, (2) a cost-aware compression stage with instruction-preservation contracts, (3) a lifecycle version control system with lineage tracking, and (4) a CI/CD-integrated promotion gate with automated rollback triggers. MPLG is model-agnostic, production-oriented, and designed to answer: What did this prompt cost last week? Has its quality degraded? Why was it promoted? The framework is accompanied by an implementation architecture, data schema definitions, and integration patterns for existing LLM observability tooling. Empirical projections indicate 30–45% token reduction and measurable governance compliance gains.