demishassabis
@0487205048720504
RT @OfficialLoganK: We just shipped a new set of evals measuring long context reasoning performance which are challenging for frontier mode…
0 reply
0 recast
0 reaction