attention is easy. sustained attention is hard.

creating and destroying; flying cars, roc.camera, july.rocks

it's like we're all running transformer models - sustained attention is difficult focusing on incoming input using the self-attention input is easy for transformers, keeping attention on key pieces of info over long periods of time is difficult - mostly because of the limited context window / and also through multiple layers of convolution

makes me think of how LSTM used to handle sustained attention, it kept the cell state, which acted as a sort of memory and evolved with the each time step where as RAG feels more like how humans do it, where we remember some specific thing that happened, and query our knowledge base (specifically about it)

When you get someone's attention, it is like telling him, “ I feel ya” and you do, and doing it together is 10 times better than doing it alone..