OpenAI has trained its LLM to confess to bad behavior
Chains of thought are like scratch pads that models use to break down tasks, make notes, and plan their next actions. Analyzing them can give clear clues about what an LLM is doing. But they are not always easy to understand. And as models get larger and more efficient, some researchers think that chains of…


