tegmark
@ejio143612
1/11: New paper! Decomposing the Dark Matter of Sparse Autoencoders. We find that SAE errors and error norms are linearly predictable using model activations. Why is this, and what does it imply for SAE scaling and the structure of language model representations? Answers in 🧵 https://t.co/inWQ7oX0T7
0 reply
0 recast
0 reaction