Guardrails

[/ˈɡɑːrdreɪlz/]

Share Term

nounAI & Technology#ai#safety#moderation#constraints

0 views1 definitions

Available in:

English Español Français Deutsch 日本語中文 العربية Português 한국어 हिन्दी

Definitions

+1218

Safety constraints and filters applied to AI systems to prevent harmful, offensive, or out-of-scope outputs. Guardrails can be implemented at the model level (via training), prompt level (system instructions), or application level (output classifiers) to keep AI behavior within acceptable boundaries.

“The guardrails blocked the model from providing detailed instructions on dangerous activities.”

by @aisafety1/1/1970

AI Alignment
AI & Technology
The research field focused on ensuring that AI systems pursue goals that match human values and intentions. A misaligned AI might optimize for a metric that appears correct but pro...
Constitutional AI
AI & Technology
A training methodology developed by Anthropic where a set of guiding principles (a "constitution") is used to self-supervise and refine AI outputs. The model critiques and rewrites...
Jailbreak
AI & Technology
A technique used to bypass the safety filters and content policies of an AI model, typically by framing harmful requests in ways the model's defenses don't recognize. Jailbreaks of...
Agentic
AI & Technology
Describing AI systems capable of autonomous action, planning, and decision-making. An agentic AI can break down tasks, use tools, and work toward goals with minimal human intervent...
Chain of Thought
AI & Technology
A prompting technique where a language model is encouraged or required to show its step-by-step reasoning before providing a final answer. Chain-of-thought prompting significantly ...
Context Window
AI & Technology
The maximum amount of text (measured in tokens) that a language model can process and "remember" in a single interaction. Information outside the context window is inaccessible to ...

Definitions

Related Terms