AI engineers are compared to humans based on SWE-bench, it mainly covers Python tasks with ≤15 line changes evaluated by unittests. I wrote this article to give you a framework to assess if SWE-bench is relevant to you

AI engineers are compared to humans based on SWE-bench, it mainly covers Python tasks with ≤15 line changes evaluated by unittests. I wrote this article to give you a framework to assess if SWE-bench is relevant to you https://stepchange-blog.ghost.io/why-do-ai-software-engineers-like-devin-struggle-to-fix-bugs/