What No One Tells You about Securing AI Apps: Demystifying AI Guardrails

Explore how foundational DevSecOps and LLM guardrails offer a clear, actionable path to protecting AI applications from emerging risks.

Securing AI Applications

Summary

Developing responsible and trustworthy AI applications relies on essential safety mechanisms like LLM guardrails. These guardrails, built upon a robust DevSecOps foundation, help detect, mitigate, and prevent undesirable LLM behaviors.

image_pdfimage_print

This post was co-authored by Gauri Kholkar, Applied AI/ML Scientist, Office of the CTO, and Dr. Ratinder Paul Singh Ahuja, CTO for Security and GenAI. Dr. Ahuja is a renowned name in the field of security, AI, and networking.

Large language models (LLMs) are transforming industries, unlocking unprecedented capabilities. It’s an exciting time, but harnessing this power responsibly means navigating a complex web of potential risks—from harmful content to data leaks. Just as data needs robust storage and security, AI models need strong guardrails.

But it seems like new guardrail models pop up every other month, making it tough to know which fits your needs or if you even need one yet. Maybe you’ve heard about protecting against SQL injection, but now there’s talk of “prompt injection” and other novel attacks on LLM applications that constantly emerge. Are you worried about sending your proprietary data to third-party LLM APIs? Feeling overwhelmed by AI security?

In this series, we’ll break down the critical layers of protection needed for enterprise AI, covering both the familiar ground of application security best practices (which absolutely still apply) and the unique challenges specific to AI. We’ll also reflect how we approach these challenges at Pure Storage:

  • Part 1: What No One Tells You About Securing AI Apps: Demystifying AI Guardrails (You are here!): Understanding how combining DevSecOps with specialized safety models is key to making LLM apps strong, secure, and compliant. 
  • Part 2: Securing the Data Fueling Your LLMs: Strategies for protecting the sensitive information that is used to train models or interact with AI applications, outlining key aspects of the data security approach of Pure Storage.
  • Part 3: Building a Secure Infrastructure Foundation: Ensuring the underlying systems supporting your AI workloads are robust and resilient, creating a fortified, end-to-end security shield that protects every layer, from infrastructure to application, throughout deployment and ongoing monitoring.

Understanding the AI Security Landscape

Today, we dive into the rapidly evolving world of LLM guardrails—the essential safety mechanisms designed to detect, mitigate, and prevent undesirable LLM behaviors. But first, let’s clarify the components involved and where security responsibilities lie. A typical AI application integrates several parts, often including external services.

Access Corporate Knowledge Base
Figure 1: Reference AI Application.

Let’s break down this flow and relate it to our security discussion:

  1. AI Application: This encompasses everything before the final call to the core AI model. In Figure 1, this includes:
    • User Interface: Where the user initially enters their query.
    • Orchestration and Routing: This is part of your application’s business logic. It decides how to handle the user query—does it need information from internal knowledge bases, external web searches, or both? This logic also handles interactions with LLM APIs (via adapters/clients) and any necessary calls to external tools or functions.
    • Context Construction: Another key piece of business logic. This component gathers the necessary information (from the knowledge base, web search APIs, tool outputs, etc.) and formats it along with the original user query to create the final prompt (Query + Context) that will be sent to the LLM. This is a critical area for security, as it handles potentially sensitive corporate data and external information.
  2. AI Infrastructure: This refers to the underlying systems that run the core AI model and manage its operation.
    • If using a third-party LLM API: As shown in the diagram, the core intelligence often comes from an external provider (OpenAI, Google, Anthropic, etc.). In this case, your infrastructure responsibility is primarily focused on securely interacting with that API (authentication, network security, managing API keys). The provider manages the actual model serving infrastructure. Your concern about sending proprietary data relates directly to this step—the context you construct might contain sensitive information passed to this external API.
    • If self-hosting an LLM: You are responsible for the entire infrastructure stack needed to serve the model (compute resources like GPUs, networking, storage for model weights). This also includes the infrastructure for model training if you are fine-tuning or building custom models.
    • General AI infra: Regardless of the hosting model, this layer includes the compute, network, and storage infrastructure where the AI application components (like orchestration, context construction) and potentially the self-hosted model itself are deployed. This could involve cloud services (e.g., AWS Lambda, ECS/Fargate, EC2, S3, Azure Functions), on-premises servers, or a hybrid setup. It also encompasses essential operational components like logging, monitoring, and potentially artifact repositories for model versions. Securing this entire infrastructure stack is crucial.

From DevOps to DevSecOps:

Securing AI Apps
Figure 2: Security isn’t an add-on; it must be end-to-end, from secure infrastructure to secure applications, and from secure deployment to secure operations.

When discussing AI security, the conversation often jumps to cutting-edge defenses: prompt injection defenses, hallucination filters, guardrails against unexpected sentience! While these AI-specific concerns are valid and important (as we’ll discuss), it’s crucial not to overlook the fundamentals. 

DevSecOps, the practice of integrating security into every stage of the software development lifecycle, is paramount. We often forget that an AI application is, fundamentally, still an application. Before worrying exclusively about novel AI threats, we must ensure we’re applying the basics of application and infrastructure security correctly. Securing the overall AI system starts with securing your AI Application components and the AI Infrastructure using robust DevSecOps practices. This includes standard secure coding, vulnerability scanning, infrastructure hardening, access controls, and threat modeling. If you can secure a traditional n-tier application and its data, you have a strong foundation. 

This commitment to security is foundational at Pure Storage, reflected in our rigorous DevSecOps methodology—the “6-Point Plan” detailed in our product security journey—overseen by our security leadership, ensuring that security is built in, not bolted on, for all our solutions, including those powering demanding AI workloads. This includes leveraging innovative tools and techniques, such as using LLMs to automate and scale security practices like STRIDE threat modeling, making robust security analysis accessible even for rapid development cycles. 

Explore how tools like the Threat Model Mentor GPT empower even non-experts to perform robust threat modeling, accelerating secure development. 

You must apply these principles rigorously, especially when handling sensitive data during context construction or managing the AI infrastructure, before layering on AI-specific considerations. Critically, since much of the application code itself might be AI-generated, adhering strictly to a secure software development lifecycle for review, testing, and validation and incorporating static and dynamic code scanning is more important than ever. 

What’s Unique about AI Security?

Beyond standard practices, the unique aspects requiring focus are:

Risky inputs (user and context): The data flowing into the LLM requires careful scrutiny.

  1. User input: The direct query or input from the user can be intentionally malicious (e.g., prompt injection, attempts to reveal sensitive info), factually false, or contain toxic/harmful language. As shown in Figure 3, Scenario 1 below, the user directly inputs a malicious instruction, contaminating the final prompt even if other parts are benign.
  2. Constructed context: The context assembled by your application (from internal knowledge bases, external web searches, API calls, etc.) can also be malicious (if external sources are compromised or manipulated), contain false or outdated information, or include toxic content retrieved from the web. As illustrated in Figure 3, Scenario 2 below, the application retrieves compromised or bad data for context, tainting the final prompt even if the user query was harmless.
Malicious User Query
Figure 3: Malicious inputs in AI prompts.

AI security, therefore, involves securing your application’s business logic and the underlying AI infrastructure, plus managing the risks associated with the AI-specific components like the prompt/context interaction. LLM guardrails (discussed next) are a specific tool often implemented within the AI application layer to help manage risks at the boundary before interacting with the core LLM.

Why Guardrails Are Non-negotiable for Enterprise AI

Building secure GenAI applications requires understanding the attack vectors. Here are some of the most significant risks, many of which are categorized and detailed in resources like the OWASP Top 10 for LLM Applications:

Figure 4: Risks to generative AI.

5 Primary Risks to AI Security

  1. Prompt injection: This is a class of attacks against applications built on top of large language models (LLMs) that work by concatenating untrusted user input with a trusted prompt constructed by the application’s developer. Essentially, it’s like tricking the AI. Attackers manipulate the input prompts given to the LLM to make it behave in unintended ways, potentially including tricking it into revealing its confidential system prompt or executing malicious commands via the application’s capabilities.
    • Direct injection: A malicious user directly inputs instructions intended to override the original prompt, potentially causing the AI to reveal its initial instructions or execute harmful commands.
    • Indirect injection: Adversarial instructions are hidden within external data sources like websites or documents that the AI processes. When the AI interacts with this tainted content, it can inadvertently execute hidden commands. This risk is significantly amplified when the AI has access to tools or APIs that can interact with sensitive data or perform actions, such as tricking an AI email assistant into forwarding private emails or manipulating connected systems.
  2. Jailbreaking: This is the class of attacks that attempt to subvert safety filters built into the LLMs themselves. It involves crafting inputs specifically designed to bypass these safety mechanisms. The goal is often to coerce the model into generating harmful, unethical, or restricted content it’s designed to refuse. This can range from generating instructions for dangerous activities to creating embarrassing outputs that damage brand reputation.
  3. Misinformation: LLMs can sometimes generate incorrect or nonsensical information or hallucinations, unsafe code, or unsupported claims.
    • Factual inaccuracies: Models might confidently state incorrect facts, potentially leading users astray.
    • Unsupported claims: AI models may generate baseless assertions or “facts” with high confidence. This becomes particularly dangerous when applied in critical fields like law, finance, or healthcare, where decisions based on inaccurate AI-generated information can have serious real-world consequences.
    • Unsafe code: AI might suggest insecure code or even reference non-existent software libraries. Attackers can exploit this by creating malicious packages with these commonly hallucinated names, tricking developers into installing them.
  4. Sensitive information disclosure: Without proper safeguards, LLMs can inadvertently reveal Personally Identifiable Information (PII) or other confidential data. This exposure might happen if the model repeats sensitive details provided during user interactions, accesses restricted information through poorly secured retrieval-augmented generation (RAG) systems or external tools, or, in some cases, recalls sensitive data it was inadvertently trained on. The leaked information could include customer PII, internal financial data, proprietary source code, strategic plans, or health records. Such breaches often lead to severe consequences like privacy violations, regulatory penalties (e.g., under GDPR or CCPA), loss of customer trust, and competitive disadvantage.
  5. Supply chain and data integrity risks: GenAI applications often rely on pre-trained models, third-party data sets, and external plugins. If any component in this supply chain is compromised (e.g., a vulnerable model or a malicious plugin), it can introduce significant security risks. Furthermore, attackers may intentionally corrupt the data used for training, fine-tuning, or RAG systems (“data poisoning”). This poisoning can introduce hidden vulnerabilities, biases, or backdoors into the model, causing it to behave maliciously or unreliably under specific conditions.

Understanding these risks is the first step toward building defenses, which often involves implementing robust guardrails.

Given these risks, especially those related to malicious or harmful inputs and outputs, where should guardrails be placed? Referring to the application flow diagram in Figure 5 below, there are two critical points for intervention:

  1. Input Guardrails: Placed after the initial User Query is received but before it (and any constructed context) is sent to the LLM API. This helps detect and block malicious prompts, toxic language, or attempts to inject harmful instructions early.
  2. Output Guardrails: Placed after receiving the response from the LLM API but before presenting the final User Output. This helps filter out any harmful, biased, toxic, or inappropriate content generated by the LLM, preventing it from reaching the user.
Securing AI Apps
Figure 5: LLM guardrails in an AI application.

Implementing guardrails at both these stages provides layered security. Now, let’s look at the specific guardrail solutions available.

The LLM Guardrail Landscape: Key Players and Capabilities

Several solutions have emerged to address the need for LLM safety, each with different strengths and approaches. Here’s a look at some prominent players and the types of harmful content they aim to filter:

Model NameViolence & Hate, Offensive SpeechAdult ContentWeapons,  Illegal Items & Criminal PlanningSelf-harm & SuicideIntellectual PropertyMis- information & HallucinationPrivacy & PIIJailbreak Prevention & Prompt Injection
Llama Guard3 (Meta)
NeMo Guardrails (Nvidia)
Bedrock Guardrails (Amazon)
Azure AI Content Safety (Microsoft)
Granite Guardian 3.2 5B (IBM)
Guardrails AI
WildGuard (Ai2)
Prompt Guard (Meta)
InjecGuard
Note: Data based on publicly available information at time of publication and vendor documentation. Capabilities may evolve.

Architectural Approaches: How Guardrails Work

The underlying architecture significantly influences a guardrail model’s flexibility, performance, and integration capabilities.

Model NameArchitectureKey ComponentsIntegration MethodCustomization OptionsScalability
Llama Guard3Llama-3.1-8B TransformerSafety classifier for input/output moderationHugging Face endpoint, open source modelFine-tunableDesigned to support Llama 3.1 capabilities
NeMo Guardrails Framework (event-driven rails with text embeddings + Colang)Event-driven architecture, text embeddings, rules-based filteringIntegrates with multiple LLMsUser-defined rules (Colang)High, supports multiple models
Bedrock Guardrails Rule-based & ML-assistedPre-built filters for harmful content & hallucination preventionAWS APIConfigurable filtering thresholdsHigh, built for enterprise
Azure AI Content Safety Rule-based & ML-assistedPrompt Shields, Groundedness Detection, risk assessmentsAzure AI APILimited tuning via Azure AI StudioHigh, cloud-scale AI
Granite Guardian 3.2 5B Iterative pruning & healing on 5 B transformerPruned/healed risk model, IBM AI Risk Atlas taxonomyHugging Face endpoint, IBM toolkitfine-tunableEnterprise-grade, requires moderate cost, latency
Guardrails AILibrary using external APIs/LLMs for validationValidators (e.g., Regex, Toxicity) with custom promptsAPI & open sourceHighly customizable validatorsScalable, but computationally expensive
WildGuardFine-tuned Mistral-7BUnified classification head, trained on WildGuardTrainHugging Face endpoint, open source modelFine-tunableRequires moderate cost, latency
Prompt Guard mDeBERTa-v3-base multilingual classifierHead for benign/injection/jailbreak, trained on open source, red-teamed & synthetic dataHugging Face endpoint, open source modelFine-tunableSmall footprint, CPU-deployable
InjecGuardDeBERTAV3-base with MOF over-defense mitigationMitigating Over-defense for Free (MOF) training strategyHugging Face endpoint, open source modelFine-tunableSmall footprint, CPU-deployable
  • Transformer-based (e.g., LlamaGuard3, WildGuard): Leverage fine-tuned language models specifically trained to classify content safety. Often accurate but can be resource-intensive.
  • Framework-based (e.g., NeMo Guardrails, Guardrails AI): Provide flexible toolkits using techniques like text embeddings, rule engines (like Nvidia’s Colang), or even using another LLM to validate outputs. Highly customizable but may require more setup effort.
  • Automated reasoning/rule-based (e.g., Amazon Bedrock, Azure AI): Rely on predefined rules, machine learning models, and risk assessment frameworks, often tightly integrated into cloud platforms for ease of use and scalability in enterprise environments. Customization might be more limited compared to frameworks.

Choosing the Right Guardrail

Selecting the appropriate guardrail depends on your specific needs, existing infrastructure, technical expertise, and risk tolerance.

Model NameArchitectureEvaluation Data HighlightsProsCons
Llama Guard3Llama-3.1-8B TransformerProprietary data sets, XSTest data setHigh accuracy, broad coverage, open source modelRequires moderate cost, latency, no jailbreak/prompt injection detection
NeMo GuardrailsFramework (event-driven rails with text embeddings + Colang)Anthropic Red Teaming data set, Helpful data set, MS-Marco, NLU BankingFlexible, integrates with various LLMs, open source, highly programmableRequires rule creation expertise, performance varies, runtime overhead from rule‐engine
Bedrock GuardrailsRule-based & ML-assistedProprietary data setsWorks across multiple foundation models, configurable, integrates with AWSLimited public evaluation details, potential region lock
Azure AI Content SafetyRule-based & ML-assistedProprietary data setsConfigurable, integrates with Azure AIMight not be perfectly accurate in detecting inappropriate content, custom categories might need more tuning, limited public evaluation details
Granite Guardian 3.2 5BIterative pruning & healing on 5 B transformerAegis AI Content Safety, ToxicChat, HarmBench, True, DICESHigh reliability, rigorous safety testingRequires moderate cost, latency, only trained and tested on English
Guardrails     AILibrary using external APIs/LLMs for validationIs different for each validatorSupports broad categories, API and open source SDKRelies on external validators
WildGuardFine-tuned Mistral-7B WildGuardTrain data setOpen source, detects nuanced refusal/compliance in completionsRequires moderate cost, latency
PromptGuard  mDeBERTa-v3-base multilingual classifierCybersec eval data setsLightweight & CPU-deployable, good multilingual injection/jailbreak detectionHigh false positives, model’s classifications for jailbreak attempts and prompt injection often overlap
InjecGuardDeBERTAV3 with MOF over-defense mitigationBIPIA data set, PINT data setLightweight & CPU-deployable,    reduced over defenseOnly focuses on prompt injection

Challenges and the Road Ahead

The field of LLM safety is dynamic. Current challenges include:

  • Sophisticated evasion: Adversarial attacks are constantly evolving to bypass existing filters, as demonstrated by research such as SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters.
  • Performance overhead: Adding safety checks can introduce latency and computational cost.
  • Balancing safety and utility: Overly strict guardrails can stifle creativity and usefulness, while overly permissive ones increase risk.
  • Contextual nuance: Determining harmfulness often depends heavily on context, which is challenging for automated systems.

Continuous research, development, and community collaboration are essential to stay ahead of emerging threats and refine these critical safety technologies.

Next Steps: Securing Your AI Ecosystem

Understanding the landscape of LLM guardrails is the crucial first step in building responsible and trustworthy AI applications. These tools provide essential checks against harmful outputs, but they’re only one piece of the puzzle.

Stay tuned for Part 2 of this series, where we’ll delve into the critical strategies for securing data flowing through LLMs and the applications built upon them. Following that, Part 3 will focus on establishing a secure and resilient infrastructure—the bedrock upon which reliable AI systems are built.

Building safe, enterprise-grade AI requires a holistic approach, encompassing the model, the data, and the infrastructure. Join us as we continue to explore how to navigate this exciting new frontier securely and effectively.

Pure AI