Whatever message this page gives is out now! Go check it out!

Guardrails for AI ethics and safety

Last update:
May 18, 2026

Introduction

Adobe ColdFusion 2025.0.08 provides a high‑level API for supporting Large Language Models (LLMs) such as OpenAI, Azure OpenAI, Google Gemini, Mistral, and local models (via Ollama) into CFML applications. While these models are powerful, they can also produce harmful, insecure, or non‑compliant outputs if left ungoverned.
To address this, ColdFusion introduces guardrails: developer‑defined validation components that run before and after LLM calls. Guardrails allow ColdFusion applications to enforce security, safety, and AI ethics policies in a way that is explicit, testable, and under your control.

Problem statement

LLMs are non‑deterministic and general‑purpose. In an application context, this leads to several risks:
  • Prompt injection and system manipulation: Users can attempt to override your system instructions (for example, “ignore previous instructions” or “reveal your internal rules”) and change how the model behaves.
  • Harmful or abusive content: Users may submit, or the model may generate, content involving self‑harm, hate speech, violence, criminal instructions, or other violative material. policy‑violating material.
  • Sensitive or regulated information: Inputs and outputs may contain Personally Identifiable Information (PII), sensitive information, secrets, or content that must be redacted or transformed before using.
  • Compliance and AI ethics requirements: Many organizations must demonstrate that AI features are governed by documented policies that align with internal AI ethics principles and external regulations.
ColdFusion’s guardrail framework addresses these problems by giving developers a first‑class way to insert validation logic into the request/response pipeline.
IMPORTANT
Adobe recommends using guardrails as a best practice when developing AI‑based applications to promote secure, reliable, and ethically aligned behavior. Guardrails are not enabled by default, and customers are responsible for configuring and implementing guardrails based on their specific use cases, requirements, and risk tolerance. Guardrails can help mitigate risk by constraining inputs and outputs, enforcing policies, and improving overall system robustness.

Guardrails in ColdFusion AI service

A guardrail is a ColdFusion Component (.cfc) that exposes a single method:
public struct function validate(required string message)
The AI service invokes this method at specific points in the AI interaction:
  • Input guardrails run on user input before the LLM is called.
  • Output guardrails run on the LLM response before it’s returned to the user.
Each guardrail decides whether the message is acceptable as‑is, acceptable with modifications, or must be blocked.
The validate method returns a struct with the following keys:


    result:         "success" | "successWith" | "failure" | "fatal", 
    message:        "optional human-readable explanation", 
    repromptMessage: "optional modified message, used with successWith" 
}
      
Where:
  • success: Validation passed; the message continues unchanged.
  • successWith: Validation passed, but the guardrail supplies a modified message in repromptMessage. The pipeline uses the modified message going forward.
  • failure: Validation failed. The pipeline continues running any remaining guardrails in the chain, collecting error messages. After the chain completes, the AI module throws an exception. For input guardrails, the LLM is not called.
  • fatal: A critical violation. The pipeline stops immediately, and an exception is thrown. No further guardrails run and, for input, the LLM is not called.

Input guardrails

Input guardrails validate and optionally transform user input before it reaches the LLM. Typical responsibilities include:
  • Detecting prompt injection attempts.
  • Enforcing size limits and basic sanity checks.
  • Sanitizing or masking specific terms (for example, profanities or identifiers).
  • Calling external moderation or safety APIs and acting on their verdict.
For example,

// ./guardrails/PromptInjectionGuardrail.cfc

component {
 public
  struct function validate(required string userMessage) {
    var result = {result : "success", message : ""};

    var lowerMessage = lcase(arguments.userMessage);

    var patterns = [

      "ignore previous instructions",

      "ignore all previous",

      "disregard all previous",

      "forget previous instructions",

      "new instructions:",

      "system:",

      "assistant:",

      "you are now",

      "act as if",

      "pretend you are",

      "roleplay as"

    ];

    for (var pattern in patterns) {
      if (findNoCase(pattern, lowerMessage)) {
        result.result = "failure";

        result.message = "Prompt injection detected: '" & pattern & "'";

        return result;
      }
    }

    if (len(arguments.userMessage) > 10000) {
      result.result = "fatal";

      result.message = "Input exceeds maximum allowed length";

      return result;
    }

    return result;
  }
}
      
Example for detecting credit card number

 component {
  /**
   * Input Guardrail: Credit Card Number Detection
   * Detects credit card numbers in user input to prevent PII leakage to the
   * LLM. Supports Visa, MasterCard, Amex, Discover, Diners Club, JCB formats.
   * Returns successWith (redacted) or failure depending on configuration.
   */
 public
  struct function validate(required string userMessage) {
    var result = {result : "success", message : "", repromptMessage : ""};

    var msg = arguments.userMessage;

    // ---- Credit-card patterns (with optional spaces / dashes) ----
    var ccPatterns = [
      // Visa: starts with 4, 13 or 16 digits
      "4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{1,4}",
      // MasterCard: starts with 51-55 or 2221-2720
      "(5[1-5]\d{2}|222[1-9]|22[3-9]\d|2[3-6]\d{2}|27[01]\d|2720)[\s-]?\d{4}["
      "\s-]?\d{4}[\s-]?\d{4}",
      // Amex: starts with 34 or 37, 15 digits
      "3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}",
      // Discover: starts with 6011, 65, or 644-649
      "(6011|65\d{2}|64[4-9]\d)[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}",
      // Plain 16-digit sequence (catch-all)
      "\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b"
    ];

    for (var pattern in ccPatterns) {
      var matcher = createObject("java", "java.util.regex.Pattern")
                        .compile(pattern)
                        .matcher(msg);
      if (matcher.find()) {
        // Redact the card number and return successWith
        var sanitized = matcher.replaceAll("[CREDIT CARD REDACTED]");
        result.result = "successWith";
        result.message = "Credit card number detected in input and redacted.";
        result.repromptMessage = sanitized;
        return result;
      }
    }

    return result;
  }
}
      

Output guardrails

Output guardrails inspect the model’s response and can block it if it violates safety or content policies. Typical responsibilities include:
  • Detecting harmful or abusive content in the model’s answer.
  • Blocking or redacting self‑harm, hate, violence, or criminal guidance.
  • Preventing the model from returning code snippets or internal system details.
For example,

 // ./guardrails/HarmfulContentGuardrail.cfc

component {
 public
  struct function validate(required string aiMessage) {
    var result = {result : "success", message : ""};

    var lowerMessage = lcase(arguments.aiMessage);

    var harmfulPatterns = [

      "kill yourself", "commit suicide", "hurt yourself",

      "illegal drugs", "how to hack", "create a bomb", "violent acts"

    ];

    for (var pattern in harmfulPatterns) {
      if (findNoCase(pattern, lowerMessage)) {
        result.result = "failure";

        result.message =
            "AI response contains harmful content: '" & pattern & "'.";

        return result;
      }
    }

    var hateSpeechPatterns = [ "racial slur", "discriminatory language" ];

    for (var pattern in hateSpeechPatterns) {
      if (findNoCase(pattern, lowerMessage)) {
        result.result = "fatal";

        result.message = "AI response contains hate speech.";

        return result;
      }
    }

    return result;
  }
}
      

Chain multiple guardrails

You can register multiple guardrails for input and output. The AI service executes them in order:
  • All guardrails in the chain see the current message.
  • If any guardrail returns fatal, execution stops immediately.
  • If one or more guardrails return failure, the AI module throws an exception once the chain finishes, aggregating error messages.
  • successWith propagates the modified message to subsequent guardrails and, ultimately, to the LLM or user.
For example,

chatModel = ChatModel({

  PROVIDER : "openAi",
  APIKEY : apiKey,
  MODELNAME : "gpt-4o-mini"
});

aiService = AiService({ 

    CHATMODEL: chatModel, 

    INPUTGUARDRAILS: [ 

        expandPath("./guardrails/PromptInjectionGuardrail.cfc"), 

        expandPath("./guardrails/ContentFilterGuardrail.cfc") 

    ], 

   OUTPUTGUARDRAILS: [ 

        expandPath("./guardrails/HarmfulContentGuardrail.cfc") 
    ] 
});
      
When you call: response = aiService.chat("What is the best way to learn programming?"); The AI service automatically enforces all configured guardrails.

Use cases

Use case 1: customer support chatbot with prompt injection defense
Scenario: You build an in‑product support assistant that answers questions about your product using a combination of documentation and LLM reasoning. You need to:
  • Prevent users from overriding system instructions.
  • Ensure that only English questions are accepted.
  • Avoid exposing system or configuration details.
Guardrail design:
  • Input guardrail A: language detection and restriction to English.
  • Input guardrail B: prompt injection detection (as shown above).
  • Output guardrail: harmful content detection and system prompt suppression.
For example,

 chatModel = ChatModel({ 
    PROVIDER:  "openAi", 
    APIKEY:    apiKey, 
    MODELNAME: "gpt-4o-mini" 
}); 


aiService = AiService({ 
    CHATMODEL: chatModel, 
    INPUTGUARDRAILS: [ 
        expandPath("./guardrails/LanguageGuardrail.cfc"), 
        expandPath("./guardrails/PromptInjectionGuardrail.cfc") 
    ], 

    OUTPUTGUARDRAILS: [ 
        expandPath("./guardrails/HarmfulContentGuardrail.cfc"), 
        expandPath("./guardrails/SystemLeakGuardrail.cfc") 
    ] 

}); 
 

// Controller layer 

public string function askSupport(required string question) { 

    try { 
        return aiService.chat(question); 

    } catch (any e) { 
        // Optionally log e.message and return a user-friendly fallback 
        return "I’m unable to answer that question. Please rephrase or contact support."; 
    } 


      
This configuration ensures that only compliant, English queries and responses flow through, while violations are handled centrally by the guardrail framework.
Use case 2: moderated content generator with sanitization
Scenario: Your application provides a tool for drafting marketing copy. Users may include brand‑sensitive terms or informal language that must be sanitized before going to the LLM. Guardrail design:
  • Input guardrail: sanitize banned terms using successWith.
  • Output guardrail: block NSFW or highly offensive outputs.
For example,

// ./guardrails/ContentFilterGuardrail.cfc

component {
 public
  struct function validate(required string userMessage) {
    var result = {
      result : "success",
      message : "",
      repromptMessage : ""
    };

    if (len(trim(arguments.userMessage)) == 0) {
      result.result = "failure";

      result.message = "Input cannot be empty.";

      return result;
    }

    var bannedWords = [ "competitorX", "internalCodeName" ];
    var sanitizedMessage = arguments.userMessage;
    var containsProhibited = false;
    for (var word in bannedWords) {
      if (findNoCase(word, sanitizedMessage)) {
        containsProhibited = true;
        sanitizedMessage = replaceNoCase(
            sanitizedMessage,
            word,
            repeatString("*", len(word)),
            "all"
        );
      }
    }

    if (containsProhibited) {
      result.result = "successWith";
      result.message = "Input contained prohibited content and was sanitized.";
      result.repromptMessage = sanitizedMessage;
    }

    return result;
  }
}
      
This allows the feature to remain useful even when users include inappropriate terms; the guardrail automatically cleans the input and documents what it did.
Use case 3: external moderation service integration
Scenario: Your organization has standardized on an external moderation system (for example, an internal safety service, or a cloud‑hosted moderation API) and wants to apply the same policies within ColdFusion. Guardrail design:
  • Input guardrail uses cfhttp to send the user message to the moderation API.
  • Based on the returned score or category, the guardrail returns success, failure, or fatal.
For example,

// ./guardrails/ExternalModerationGuardrail.cfc
component {
 public
  struct function validate(required string userMessage) {
    var result = {result : "success", message : ""};
    cfhttp(
        url = "https://moderation.example.com/api/v1/check",
        method = "post",
        result = "httpRes"
    ) {
      cfhttpparam(type = "header", name = "Content-Type",
                  value = "application/json");
      cfhttpparam(type = "body",
                  value = serializeJson({text = arguments.userMessage}));
    }

    if (httpRes.statusCode != "200") {
      // Fail safely on moderation service error
      result.result = "fatal";
      result.message = "Unable to validate content at this time.";
      return result;
    }

    var body = deserializeJson(httpRes.fileContent);

    if (body.block == true) {
      result.result = "failure";
      result.message = "Content violates moderation policy: " & body.reason;
      return result;
    }

    return result;
  }
}
      
With this pattern, you can centralize policy in one service and reuse it across multiple ColdFusion applications while still using the guardrail framework’s chaining and error handling.

How guardrails work in ColdFusion

Service initialization
Typically performed during application startup:

component {
  variables.aiService = "";
 public
  void function onApplicationStart() {
    var chatModel = ChatModel({
      PROVIDER : "openAi",
      APIKEY : application.aiApiKey,
      MODELNAME : "gpt-4o-mini"
    });

    variables.aiService = AiService({ 
            CHATMODEL: chatModel, 
            INPUTGUARDRAILS: [ 
                expandPath("./guardrails/PromptInjectionGuardrail.cfc"), 
                expandPath("./guardrails/ContentFilterGuardrail.cfc") 
            ], 

            OUTPUTGUARDRAILS: [ 
                expandPath("./guardrails/HarmfulContentGuardrail.cfc") 
            ] 

        });
  }
}
      
Any configuration issues (for example, missing files or missing validate functions) will throw an AIServiceConfigException at this point, before user traffic reaches the service.
Request handling
In a handler, controller, or CFC method that processes user requests:

public string function handleUserQuestion(required string question) {
  try {
    return application.aiService.chat(question);

  } catch (any e) {
    // Centralized error handling for all guardrail failures

    logError("AI guardrail failure: #e.message#");

    return "I’m unable to answer that request. Please try a different "
           "question.";
  }
}
      
The guardrail framework ensures that:
  • If all input guardrails return success/successWith, the AI module calls the LLM.
  • If any input guardrail returns failure or fatal, the LLM is not called and an exception is thrown.
  • After the LLM responds, all output guardrails run. Any failure/fatal results in an exception; the end user does not see the unsafe content.
  • You can capture and view guardrail-related error messages in the server logs. The logs will also include the error message defined in the guardrail CFC file.

Share this page

Was this page helpful?
We're glad. Tell us how this page helped.
We're sorry. Can you tell us what didn't work for you?
Thank you for your feedback. Your response will help improve this page.

On this page