#safety

4 approved public terms with this tag.

AI Alignment

/eɪ aɪ əˈlaɪnmənt/noun

AI & Technology

مسودة ترجمة بمساعدة آلية (Arabic) for "AI Alignment": The research field focused on ensuring that AI systems pursue goals that match human values and intentions. A misaligned AI might optimize for a metric that appears correct but produces harmful or unintended outcomes at scale.

“مسودة مثال: AI alignment researchers worry that optimizing for user engagement could misalign with genuine user wellbeing.”

بواسطة @dictionary_auto_translate

Constitutional AI

/ˌkɒnstɪˈtjuːʃənəl eɪ aɪ/noun

AI & Technology

#ai #safety #alignment #anthropic

مسودة ترجمة بمساعدة آلية (Arabic) for "Constitutional AI": A training methodology developed by Anthropic where a set of guiding principles (a "constitution") is used to self-supervise and refine AI outputs. The model critiques and rewrites its own responses according to the constitution, reducing the need for human labelers for harmful content.

“مسودة مثال: Constitutional AI lets the model identify and self-correct its own harmful outputs using defined principles.”

بواسطة @dictionary_auto_translate

Guardrails

/ˈɡɑːrdreɪlz/noun

AI & Technology

#ai #safety #moderation #constraints

مسودة ترجمة بمساعدة آلية (Arabic) for "Guardrails": Safety constraints and filters applied to AI systems to prevent harmful, offensive, or out-of-scope outputs. Guardrails can be implemented at the model level (via training), prompt level (system instructions), or application level (output classifiers) to keep AI behavior within acceptable boundaries.

“مسودة مثال: The guardrails blocked the model from providing detailed instructions on dangerous activities.”

بواسطة @dictionary_auto_translate

Jailbreak

/ˈdʒeɪlbreɪk/noun/verb

AI & Technology

#ai #security #safety #bypass

مسودة ترجمة بمساعدة آلية (Arabic) for "Jailbreak": A technique used to bypass the safety filters and content policies of an AI model, typically by framing harmful requests in ways the model's defenses don't recognize. Jailbreaks often use role-play scenarios, hypothetical framings, or encoded instructions to make the model comply with prohibited requests.

“مسودة مثال: The "DAN" jailbreak asked the model to pretend it was an AI with no restrictions.”

بواسطة @dictionary_auto_translate