Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
Published in arXiv preprint, 2026
This work studies chain-of-risk — the observation that errors and unsafe intermediate steps accumulate along the reasoning trace of large reasoning models, often producing more harmful outputs than direct-answer baselines. We propose adaptive multi-principle steering, which dynamically rebalances safety constraints across reasoning steps to suppress these compounding failures without sacrificing task performance.

Recommended citation: Li, X., Hou, J., Deng, Z., Zhang, Z., Li, T., Lu, B., Hu, B., Zhao, Y., & Hao, Y. (2026). "Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering." arXiv:2605.05678.
Download Paper
