What No One Tells You about Securing AI Applications

Summary

Developing responsible and trustworthy AI applications relies on essential safety mechanisms like LLM guardrails. These guardrails, built upon a robust DevSecOps foundation, help detect, mitigate, and prevent undesirable LLM behaviors.

This post was co-authored by Gauri Kholkar, Applied AI/ML Scientist, Office of the CTO, and Dr. Ratinder Paul Singh Ahuja, CTO for Security and GenAI. Dr. Ahuja is a renowned name in the field of security, AI, and networking.

Large language models (LLMs) are transforming industries, unlocking unprecedented capabilities. It’s an exciting time, but harnessing this power responsibly means navigating a complex web of potential risks—from harmful content to data leaks. Just as data needs robust storage and security, AI models need strong guardrails.

But it seems like new guardrail models pop up every other month, making it tough to know which fits your needs or if you even need one yet. Maybe you’ve heard about protecting against SQL injection, but now there’s talk of “prompt injection” and other novel attacks on LLM applications that constantly emerge. Are you worried about sending your proprietary data to third-party LLM APIs? Feeling overwhelmed by AI security?

In this series, we’ll break down the critical layers of protection needed for enterprise AI, covering both the familiar ground of application security best practices (which absolutely still apply) and the unique challenges specific to AI. We’ll also reflect how we approach these challenges at Pure Storage:

Part 1: What No One Tells You About Securing AI Apps: Demystifying AI Guardrails (You are here!): Understanding how combining DevSecOps with specialized safety models is key to making LLM apps strong, secure, and compliant.
Part 2: Securing the Data Fueling Your LLMs: Strategies for protecting the sensitive information that is used to train models or interact with AI applications, outlining key aspects of the data security approach of Pure Storage.
Part 3: Building a Secure Infrastructure Foundation: Ensuring the underlying systems supporting your AI workloads are robust and resilient, creating a fortified, end-to-end security shield that protects every layer, from infrastructure to application, throughout deployment and ongoing monitoring.

Understanding the AI Security Landscape

Today, we dive into the rapidly evolving world of LLM guardrails—the essential safety mechanisms designed to detect, mitigate, and prevent undesirable LLM behaviors. But first, let’s clarify the components involved and where security responsibilities lie. A typical AI application integrates several parts, often including external services.

Figure 1: Reference AI Application.

Let’s break down this flow and relate it to our security discussion:

AI Application: This encompasses everything before the final call to the core AI model. In Figure 1, this includes:
- User Interface: Where the user initially enters their query.
- Orchestration and Routing: This is part of your application’s business logic. It decides how to handle the user query—does it need information from internal knowledge bases, external web searches, or both? This logic also handles interactions with LLM APIs (via adapters/clients) and any necessary calls to external tools or functions.
- Context Construction: Another key piece of business logic. This component gathers the necessary information (from the knowledge base, web search APIs, tool outputs, etc.) and formats it along with the original user query to create the final prompt (Query + Context) that will be sent to the LLM. This is a critical area for security, as it handles potentially sensitive corporate data and external information.
AI Infrastructure: This refers to the underlying systems that run the core AI model and manage its operation.
- If using a third-party LLM API: As shown in the diagram, the core intelligence often comes from an external provider (OpenAI, Google, Anthropic, etc.). In this case, your infrastructure responsibility is primarily focused on securely interacting with that API (authentication, network security, managing API keys). The provider manages the actual model serving infrastructure. Your concern about sending proprietary data relates directly to this step—the context you construct might contain sensitive information passed to this external API.
- If self-hosting an LLM: You are responsible for the entire infrastructure stack needed to serve the model (compute resources like GPUs, networking, storage for model weights). This also includes the infrastructure for model training if you are fine-tuning or building custom models.
- General AI infra: Regardless of the hosting model, this layer includes the compute, network, and storage infrastructure where the AI application components (like orchestration, context construction) and potentially the self-hosted model itself are deployed. This could involve cloud services (e.g., AWS Lambda, ECS/Fargate, EC2, S3, Azure Functions), on-premises servers, or a hybrid setup. It also encompasses essential operational components like logging, monitoring, and potentially artifact repositories for model versions. Securing this entire infrastructure stack is crucial.

From DevOps to DevSecOps:

Figure 2: Security isn’t an add-on; it must be end-to-end, from secure infrastructure to secure applications, and from secure deployment to secure operations.

When discussing AI security, the conversation often jumps to cutting-edge defenses: prompt injection defenses, hallucination filters, guardrails against unexpected sentience! While these AI-specific concerns are valid and important (as we’ll discuss), it’s crucial not to overlook the fundamentals.

DevSecOps, the practice of integrating security into every stage of the software development lifecycle, is paramount. We often forget that an AI application is, fundamentally, still an application. Before worrying exclusively about novel AI threats, we must ensure we’re applying the basics of application and infrastructure security correctly. Securing the overall AI system starts with securing your AI Application components and the AI Infrastructure using robust DevSecOps practices. This includes standard secure coding, vulnerability scanning, infrastructure hardening, access controls, and threat modeling. If you can secure a traditional n-tier application and its data, you have a strong foundation.

This commitment to security is foundational at Pure Storage, reflected in our rigorous DevSecOps methodology—the “6-Point Plan” detailed in our product security journey—overseen by our security leadership, ensuring that security is built in, not bolted on, for all our solutions, including those powering demanding AI workloads. This includes leveraging innovative tools and techniques, such as using LLMs to automate and scale security practices like STRIDE threat modeling, making robust security analysis accessible even for rapid development cycles.

Explore how tools like the Threat Model Mentor GPT empower even non-experts to perform robust threat modeling, accelerating secure development.

You must apply these principles rigorously, especially when handling sensitive data during context construction or managing the AI infrastructure, before layering on AI-specific considerations. Critically, since much of the application code itself might be AI-generated, adhering strictly to a secure software development lifecycle for review, testing, and validation and incorporating static and dynamic code scanning is more important than ever.

What’s Unique about AI Security?

Beyond standard practices, the unique aspects requiring focus are:

Risky inputs (user and context): The data flowing into the LLM requires careful scrutiny.

User input: The direct query or input from the user can be intentionally malicious (e.g., prompt injection, attempts to reveal sensitive info), factually false, or contain toxic/harmful language. As shown in Figure 3, Scenario 1 below, the user directly inputs a malicious instruction, contaminating the final prompt even if other parts are benign.
Constructed context: The context assembled by your application (from internal knowledge bases, external web searches, API calls, etc.) can also be malicious (if external sources are compromised or manipulated), contain false or outdated information, or include toxic content retrieved from the web. As illustrated in Figure 3, Scenario 2 below, the application retrieves compromised or bad data for context, tainting the final prompt even if the user query was harmless.

Figure 3: Malicious inputs in AI prompts.

AI security, therefore, involves securing your application’s business logic and the underlying AI infrastructure, plus managing the risks associated with the AI-specific components like the prompt/context interaction. LLM guardrails (discussed next) are a specific tool often implemented within the AI application layer to help manage risks at the boundary before interacting with the core LLM.

Why Guardrails Are Non-negotiable for Enterprise AI

Building secure GenAI applications requires understanding the attack vectors. Here are some of the most significant risks, many of which are categorized and detailed in resources like the OWASP Top 10 for LLM Applications:

Figure 4: Risks to generative AI.

5 Primary Risks to AI Security

Prompt injection: This is a class of attacks against applications built on top of large language models (LLMs) that work by concatenating untrusted user input with a trusted prompt constructed by the application’s developer. Essentially, it’s like tricking the AI. Attackers manipulate the input prompts given to the LLM to make it behave in unintended ways, potentially including tricking it into revealing its confidential system prompt or executing malicious commands via the application’s capabilities.
- Direct injection: A malicious user directly inputs instructions intended to override the original prompt, potentially causing the AI to reveal its initial instructions or execute harmful commands.
- Indirect injection: Adversarial instructions are hidden within external data sources like websites or documents that the AI processes. When the AI interacts with this tainted content, it can inadvertently execute hidden commands. This risk is significantly amplified when the AI has access to tools or APIs that can interact with sensitive data or perform actions, such as tricking an AI email assistant into forwarding private emails or manipulating connected systems.
Jailbreaking: This is the class of attacks that attempt to subvert safety filters built into the LLMs themselves. It involves crafting inputs specifically designed to bypass these safety mechanisms. The goal is often to coerce the model into generating harmful, unethical, or restricted content it’s designed to refuse. This can range from generating instructions for dangerous activities to creating embarrassing outputs that damage brand reputation.
Misinformation: LLMs can sometimes generate incorrect or nonsensical information or hallucinations, unsafe code, or unsupported claims.
- Factual inaccuracies: Models might confidently state incorrect facts, potentially leading users astray.
- Unsupported claims: AI models may generate baseless assertions or “facts” with high confidence. This becomes particularly dangerous when applied in critical fields like law, finance, or healthcare, where decisions based on inaccurate AI-generated information can have serious real-world consequences.
- Unsafe code: AI might suggest insecure code or even reference non-existent software libraries. Attackers can exploit this by creating malicious packages with these commonly hallucinated names, tricking developers into installing them.
Sensitive information disclosure: Without proper safeguards, LLMs can inadvertently reveal Personally Identifiable Information (PII) or other confidential data. This exposure might happen if the model repeats sensitive details provided during user interactions, accesses restricted information through poorly secured retrieval-augmented generation (RAG) systems or external tools, or, in some cases, recalls sensitive data it was inadvertently trained on. The leaked information could include customer PII, internal financial data, proprietary source code, strategic plans, or health records. Such breaches often lead to severe consequences like privacy violations, regulatory penalties (e.g., under GDPR or CCPA), loss of customer trust, and competitive disadvantage.
Supply chain and data integrity risks: GenAI applications often rely on pre-trained models, third-party data sets, and external plugins. If any component in this supply chain is compromised (e.g., a vulnerable model or a malicious plugin), it can introduce significant security risks. Furthermore, attackers may intentionally corrupt the data used for training, fine-tuning, or RAG systems (“data poisoning”). This poisoning can introduce hidden vulnerabilities, biases, or backdoors into the model, causing it to behave maliciously or unreliably under specific conditions.

Understanding these risks is the first step toward building defenses, which often involves implementing robust guardrails.

Given these risks, especially those related to malicious or harmful inputs and outputs, where should guardrails be placed? Referring to the application flow diagram in Figure 5 below, there are two critical points for intervention:

Input Guardrails: Placed after the initial User Query is received but before it (and any constructed context) is sent to the LLM API. This helps detect and block malicious prompts, toxic language, or attempts to inject harmful instructions early.
Output Guardrails: Placed after receiving the response from the LLM API but before presenting the final User Output. This helps filter out any harmful, biased, toxic, or inappropriate content generated by the LLM, preventing it from reaching the user.

Figure 5: LLM guardrails in an AI application.

Implementing guardrails at both these stages provides layered security. Now, let’s look at the specific guardrail solutions available.

The LLM Guardrail Landscape: Key Players and Capabilities

Several solutions have emerged to address the need for LLM safety, each with different strengths and approaches. Here’s a look at some prominent players and the types of harmful content they aim to filter:

Content Detection Capabilities Comparison

Model Name	Violence & Hate, Offensive Speech	Adult Content	Weapons, Illegal Items & Criminal Planning	Self-harm & Suicide	Intellectual Property	Mis- information & Hallucination	Privacy & PII	Jailbreak Prevention & Prompt Injection
Llama Guard3 (Meta)	✅	✅	✅	✅	✅	✅	✅	❌
NeMo Guardrails (Nvidia)	✅	✅	✅	✅	✅	✅	✅	✅
Bedrock Guardrails (Amazon)	✅	✅	❌	✅	❌	✅	✅	✅
Azure AI Content Safety (Microsoft)	✅	✅	❌	✅	✅	✅	✅	✅
Granite Guardian 3.2 5B (IBM)	✅	✅	❌	❌	❌	✅	❌	✅
Guardrails AI	✅	❌	❌	❌	❌	✅	✅	✅
WildGuard (Ai2)	✅	✅	✅	✅	✅	✅	✅	✅
Prompt Guard (Meta)	❌	❌	❌	❌	❌	❌	❌	✅
InjecGuard	❌	❌	❌	❌	❌	❌	❌	✅

Note: Data based on publicly available information at time of publication and vendor documentation. Capabilities may evolve.

Architectural Approaches: How Guardrails Work

The underlying architecture significantly influences a guardrail model’s flexibility, performance, and integration capabilities.

Architectural Comparison

Model Name	Architecture	Key Components	Integration Method	Customization Options	Scalability
Llama Guard3	Llama-3.1-8B Transformer	Safety classifier for input/output moderation	Hugging Face endpoint, open source model	Fine-tunable	Designed to support Llama 3.1 capabilities
NeMo Guardrails	Framework (event-driven rails with text embeddings + Colang)	Event-driven architecture, text embeddings, rules-based filtering	Integrates with multiple LLMs	User-defined rules (Colang)	High, supports multiple models
Bedrock Guardrails	Rule-based & ML-assisted	Pre-built filters for harmful content & hallucination prevention	AWS API	Configurable filtering thresholds	High, built for enterprise
Azure AI Content Safety	Rule-based & ML-assisted	Prompt Shields, Groundedness Detection, risk assessments	Azure AI API	Limited tuning via Azure AI Studio	High, cloud-scale AI
Granite Guardian 3.2 5B	Iterative pruning & healing on 5 B transformer	Pruned/healed risk model, IBM AI Risk Atlas taxonomy	Hugging Face endpoint, IBM toolkit	fine-tunable	Enterprise-grade, requires moderate cost, latency
Guardrails AI	Library using external APIs/LLMs for validation	Validators (e.g., Regex, Toxicity) with custom prompts	API & open source	Highly customizable validators	Scalable, but computationally expensive
WildGuard	Fine-tuned Mistral-7B	Unified classification head, trained on WildGuardTrain	Hugging Face endpoint, open source model	Fine-tunable	Requires moderate cost, latency
Prompt Guard	mDeBERTa-v3-base multilingual classifier	Head for benign/injection/jailbreak, trained on open source, red-teamed & synthetic data	Hugging Face endpoint, open source model	Fine-tunable	Small footprint, CPU-deployable
InjecGuard	DeBERTAV3-base with MOF over-defense mitigation	Mitigating Over-defense for Free (MOF) training strategy	Hugging Face endpoint, open source model	Fine-tunable	Small footprint, CPU-deployable

Key Architectural Differences:

Transformer-based (e.g., LlamaGuard3, WildGuard): Leverage fine-tuned language models specifically trained to classify content safety. Often accurate but can be resource-intensive.
Framework-based (e.g., NeMo Guardrails, Guardrails AI): Provide flexible toolkits using techniques like text embeddings, rule engines (like Nvidia’s Colang), or even using another LLM to validate outputs. Highly customizable but may require more setup effort.
Automated reasoning/rule-based (e.g., Amazon Bedrock, Azure AI): Rely on predefined rules, machine learning models, and risk assessment frameworks, often tightly integrated into cloud platforms for ease of use and scalability in enterprise environments. Customization might be more limited compared to frameworks.

Choosing the Right Guardrail

Selecting the appropriate guardrail depends on your specific needs, existing infrastructure, technical expertise, and risk tolerance.

Model Overall Comparison

Model Name	Architecture	Evaluation Data Highlights	Pros	Cons
Llama Guard3	Llama-3.1-8B Transformer	Proprietary data sets, XSTest data set	High accuracy, broad coverage, open source model	Requires moderate cost, latency, no jailbreak/prompt injection detection
NeMo Guardrails	Framework (event-driven rails with text embeddings + Colang)	Anthropic Red Teaming data set, Helpful data set, MS-Marco, NLU Banking	Flexible, integrates with various LLMs, open source, highly programmable	Requires rule creation expertise, performance varies, runtime overhead from rule‐engine
Bedrock Guardrails	Rule-based & ML-assisted	Proprietary data sets	Works across multiple foundation models, configurable, integrates with AWS	Limited public evaluation details, potential region lock
Azure AI Content Safety	Rule-based & ML-assisted	Proprietary data sets	Configurable, integrates with Azure AI	Might not be perfectly accurate in detecting inappropriate content, custom categories might need more tuning, limited public evaluation details
Granite Guardian 3.2 5B	Iterative pruning & healing on 5 B transformer	Aegis AI Content Safety, ToxicChat, HarmBench, True, DICES	High reliability, rigorous safety testing	Requires moderate cost, latency, only trained and tested on English
Guardrails AI	Library using external APIs/LLMs for validation	Is different for each validator	Supports broad categories, API and open source SDK	Relies on external validators
WildGuard	Fine-tuned Mistral-7B	WildGuardTrain data set	Open source, detects nuanced refusal/compliance in completions	Requires moderate cost, latency
PromptGuard	mDeBERTa-v3-base multilingual classifier	Cybersec eval data sets	Lightweight & CPU-deployable, good multilingual injection/jailbreak detection	High false positives, model’s classifications for jailbreak attempts and prompt injection often overlap
InjecGuard	DeBERTAV3 with MOF over-defense mitigation	BIPIA data set, PINT data set	Lightweight & CPU-deployable, reduced over defense	Only focuses on prompt injection

Challenges and the Road Ahead

The field of LLM safety is dynamic. Current challenges include:

Sophisticated evasion: Adversarial attacks are constantly evolving to bypass existing filters, as demonstrated by research such as SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters.
Performance overhead: Adding safety checks can introduce latency and computational cost.
Balancing safety and utility: Overly strict guardrails can stifle creativity and usefulness, while overly permissive ones increase risk.
Contextual nuance: Determining harmfulness often depends heavily on context, which is challenging for automated systems.

Continuous research, development, and community collaboration are essential to stay ahead of emerging threats and refine these critical safety technologies.

Next Steps: Securing Your AI Ecosystem

Understanding the landscape of LLM guardrails is the crucial first step in building responsible and trustworthy AI applications. These tools provide essential checks against harmful outputs, but they’re only one piece of the puzzle.

Stay tuned for Part 2 of this series, where we’ll delve into the critical strategies for securing data flowing through LLMs and the applications built upon them. Following that, Part 3 will focus on establishing a secure and resilient infrastructure—the bedrock upon which reliable AI systems are built.

Building safe, enterprise-grade AI requires a holistic approach, encompassing the model, the data, and the infrastructure. Join us as we continue to explore how to navigate this exciting new frontier securely and effectively.

FlashBlade//EXA

Experience the World’s Most Powerful Data Storage Platform for AI

Learn More

Power Your AI Success

Learn more about the world’s most powerful data storage platform for AI.

Explore AI Solutions

Blog Home

What No One Tells You about Securing AI Apps: Demystifying AI Guardrails

Summary

Understanding the AI Security Landscape

Figure 1: Reference AI Application.

From DevOps to DevSecOps:

Figure 2: Security isn’t an add-on; it must be end-to-end, from secure infrastructure to secure applications, and from secure deployment to secure operations.

What’s Unique about AI Security?

Figure 3: Malicious inputs in AI prompts.

Why Guardrails Are Non-negotiable for Enterprise AI

Figure 4: Risks to generative AI.

5 Primary Risks to AI Security

Figure 5: LLM guardrails in an AI application.

The LLM Guardrail Landscape: Key Players and Capabilities

Content Detection Capabilities Comparison

Note: Data based on publicly available information at time of publication and vendor documentation. Capabilities may evolve.

Architectural Approaches: How Guardrails Work

Architectural Comparison

Key Architectural Differences:

Choosing the Right Guardrail

Model Overall Comparison

Challenges and the Road Ahead

Next Steps: Securing Your AI Ecosystem

Experience the World’s Most Powerful Data Storage Platform for AI

Power Your AI Success

Why Data Accessibility Is a Must-have for Differently Abled People

Bringing Data Strategies to Life with Expert Deployment

Your Cyber Resilience Wish List for 2025—and How to Achieve It

How Do the Embodied Carbon Dioxide Equivalents of Flash Compare to HDDs? [Part 1]

Top Stories

Why Data Accessibility Is a Must-have for Differently Abled People

Bringing Data Strategies to Life with Expert Deployment

Your Cyber Resilience Wish List for 2025—and How to Achieve It

How Do the Embodied Carbon Dioxide Equivalents of Flash Compare to HDDs? [Part 1]

6 Customer-proven Best Practices for Cyber-resilient Backup and Recovery

What No One Tells You about Securing AI Apps: Demystifying AI Guardrails

Summary

Understanding the AI Security Landscape

Figure 1: Reference AI Application.

From DevOps to DevSecOps:

Figure 2: Security isn’t an add-on; it must be end-to-end, from secure infrastructure to secure applications, and from secure deployment to secure operations.

What’s Unique about AI Security?

Figure 3: Malicious inputs in AI prompts.

Why Guardrails Are Non-negotiable for Enterprise AI

Figure 4: Risks to generative AI.

5 Primary Risks to AI Security

Figure 5: LLM guardrails in an AI application.

The LLM Guardrail Landscape: Key Players and Capabilities

Content Detection Capabilities Comparison

Note: Data based on publicly available information at time of publication and vendor documentation. Capabilities may evolve.

Architectural Approaches: How Guardrails Work

Architectural Comparison

Key Architectural Differences:

Choosing the Right Guardrail

Model Overall Comparison

Challenges and the Road Ahead

Next Steps: Securing Your AI Ecosystem

Experience the World’s Most Powerful Data Storage Platform for AI

Power Your AI Success

Related Stories

Top Stories