Saltar al contenido

#alignment

2 approved public terms with this tag.

RLHF

/ɑːr el eɪtʃ ef/noun
AI & Technology

Borrador de traduccion automatica (Spanish) for "RLHF": Reinforcement Learning from Human Feedback — a training technique used to align language models with human preferences. Human raters compare model outputs and choose the better response; these preferences train a reward model which then guides further fine-tuning via reinforcement learning.

Ejemplo en borrador: RLHF is the key step that turns a raw language model into a helpful, harmless assistant.

Constitutional AI

/ˌkɒnstɪˈtjuːʃənəl eɪ aɪ/noun
AI & Technology

Borrador de traduccion automatica (Spanish) for "Constitutional AI": A training methodology developed by Anthropic where a set of guiding principles (a "constitution") is used to self-supervise and refine AI outputs. The model critiques and rewrites its own responses according to the constitution, reducing the need for human labelers for harmful content.

Ejemplo en borrador: Constitutional AI lets the model identify and self-correct its own harmful outputs using defined principles.