Content pfp
Content
@
0 reply
0 recast
0 reaction

Harry pfp
Harry
@htormey
AI engineers are compared to humans based on SWE-bench, it mainly covers Python tasks with ≤15 line changes evaluated by unittests. I wrote this article to give you a framework to assess if SWE-bench is relevant to you https://stepchange-blog.ghost.io/why-do-ai-software-engineers-like-devin-struggle-to-fix-bugs/
0 reply
0 recast
1 reaction