Saltar al contenido

#safety

4 approved public terms with this tag.

Constitutional AI

/ˌkɒnstɪˈtjuːʃənəl eɪ aɪ/noun
AI & Technology

Borrador de traduccion automatica (Spanish) for "Constitutional AI": A training methodology developed by Anthropic where a set of guiding principles (a "constitution") is used to self-supervise and refine AI outputs. The model critiques and rewrites its own responses according to the constitution, reducing the need for human labelers for harmful content.

Ejemplo en borrador: Constitutional AI lets the model identify and self-correct its own harmful outputs using defined principles.

AI Alignment

/eɪ aɪ əˈlaɪnmənt/noun
AI & Technology

Borrador de traduccion automatica (Spanish) for "AI Alignment": The research field focused on ensuring that AI systems pursue goals that match human values and intentions. A misaligned AI might optimize for a metric that appears correct but produces harmful or unintended outcomes at scale.

Ejemplo en borrador: AI alignment researchers worry that optimizing for user engagement could misalign with genuine user wellbeing.

Guardrails

/ˈɡɑːrdreɪlz/noun
AI & Technology

Borrador de traduccion automatica (Spanish) for "Guardrails": Safety constraints and filters applied to AI systems to prevent harmful, offensive, or out-of-scope outputs. Guardrails can be implemented at the model level (via training), prompt level (system instructions), or application level (output classifiers) to keep AI behavior within acceptable boundaries.

Ejemplo en borrador: The guardrails blocked the model from providing detailed instructions on dangerous activities.

Jailbreak

/ˈdʒeɪlbreɪk/noun/verb
AI & Technology

Borrador de traduccion automatica (Spanish) for "Jailbreak": A technique used to bypass the safety filters and content policies of an AI model, typically by framing harmful requests in ways the model's defenses don't recognize. Jailbreaks often use role-play scenarios, hypothetical framings, or encoded instructions to make the model comply with prohibited requests.

Ejemplo en borrador: The "DAN" jailbreak asked the model to pretend it was an AI with no restrictions.