Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe slightly related, canon layers provide direct horizontal information flow along residual steams. See this paper, which precisely claims that LLMs struggle with horizontal information flow as "looking back a token" is fairly expensive since it can only be done via encoding in the residual stream and attention layers

https://openreview.net/pdf?id=kxv0M6I7Ud



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: