Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
Published in arXiv preprint, 2026
Characterizes how chain-of-thought reasoning compounds safety failures in large reasoning models and proposes adaptive multi-principle steering as a mitigation.
Recommended citation: Li, X., Hou, J., Deng, Z., Zhang, Z., Li, T., Lu, B., Hu, B., Zhao, Y., & Hao, Y. (2026). "Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering." arXiv:2605.05678.
Download Paper
