OpenAI Finds Reasoning Models Struggle to Control Thought

In a fascinating piece of introspection, OpenAI researchers have uncovered a potential limitation—and a safety feature—in advanced reasoning AI models. Their study, introducing a concept called 'CoT-Control' (Chain-of-Thought Control), found that even highly capable models struggle to deliberately control or steer their own internal chains of thought when solving problems. This means that while a model can reason step-by-step to an answer, it often cannot easily follow a human-prescribed, alternative reasoning path if instructed to do so. Rather than viewing this as a flaw, OpenAI suggests this inherent difficulty is a positive development for AI safety. It reinforces 'monitorability'—the ability for humans or oversight systems to understand and track the model's actual reasoning process, as it cannot easily hide its intent behind a misleading internal narrative. This research provides crucial insights into the transparency and controllability of increasingly autonomous AI systems, hi

OpenAI Finds Reasoning Models Struggle to Control Thought

Related news