Blog Logo

Teaching Large Language Models to Self-Debug

Large language models have achieved impressive performance on code generation, but generating correct code with a single attempt is challenging. To address this problem, the authors propose SELF-DEBUGGING, which teaches a large language model to debug its predicted program via few-shot demonstrations. SELF-DEBUGGING achieves state-of-the-art performance on several code generation benchmarks, including the Spider dataset for text-to-SQL generation, TransCoder for C++-to-Python translation, and MBPP for text-to-Python generation. By leveraging feedback messages and reusing failed predictions, SELF-DEBUGGING notably improves sample efficiency and can match or outperform baseline models that generate more than 10 candidate programs.