Content pfp
Content
@
0 reply
0 recast
0 reaction

Braeden Norris pfp
Braeden Norris
@braeden
One of my favorite ways to qualitatively evaluate language models is through Caesar ciphers. I usually start with a shift of 13 (which is most popular) and then a less common shift. Claude 3 Opus is only able to solve the more common shift. (GPT-4 can only ocassionaly solve 13 in my experience)
1 reply
0 recast
0 reaction

Braeden Norris pfp
Braeden Norris
@braeden
The other fun thing is to test the limitations of tokenization.
0 reply
0 recast
0 reaction