In what may be a first of its kind study, artificial intelligence (AI) firm Anthropic has developed a large language model (LLM) that’s been fine-tuned for value judgments by its user community.
Many public-facing LLMs have been developed with guardrails — encoded instructions dictating specific behavior — in place in an attempt to limit unwanted outputs. Anthropic’s Claude and OpenAI’s ChatGPT, for example, typically give users a canned safety response to output requests related to violent or controversial topics.