Notes on "Evaluating Large Language Models Trained on Code"
Notes on "SWE-BENCH: Can language models resolve real-world Github issues"
Notes on "COFFE: A Code Efficiency Benchmark for Code Generation"
Notes on "The Illusion of Thinking"