#safety
4 approved public terms with this tag.
Constitutional AI
مسودة ترجمة بمساعدة آلية (Arabic) for "Constitutional AI": A training methodology developed by Anthropic where a set of guiding principles (a "constitution") is used to self-supervise and refine AI outputs. The model critiques and rewrites its own responses according to the constitution, reducing the need for human labelers for harmful content.
“مسودة مثال: Constitutional AI lets the model identify and self-correct its own harmful outputs using defined principles.”
AI Alignment
مسودة ترجمة بمساعدة آلية (Arabic) for "AI Alignment": The research field focused on ensuring that AI systems pursue goals that match human values and intentions. A misaligned AI might optimize for a metric that appears correct but produces harmful or unintended outcomes at scale.
“مسودة مثال: AI alignment researchers worry that optimizing for user engagement could misalign with genuine user wellbeing.”
Guardrails
مسودة ترجمة بمساعدة آلية (Arabic) for "Guardrails": Safety constraints and filters applied to AI systems to prevent harmful, offensive, or out-of-scope outputs. Guardrails can be implemented at the model level (via training), prompt level (system instructions), or application level (output classifiers) to keep AI behavior within acceptable boundaries.
“مسودة مثال: The guardrails blocked the model from providing detailed instructions on dangerous activities.”
مسودة ترجمة بمساعدة آلية (Arabic) for "Jailbreak": A technique used to bypass the safety filters and content policies of an AI model, typically by framing harmful requests in ways the model's defenses don't recognize. Jailbreaks often use role-play scenarios, hypothetical framings, or encoded instructions to make the model comply with prohibited requests.
“مسودة مثال: The "DAN" jailbreak asked the model to pretend it was an AI with no restrictions.”