This doesn't make a lot of sense when you consider how backprop works. Layers ar...

		danielmarkbruce on March 16, 2024 \| parent \| context \| favorite \| on: Quiet-STaR: Language Models Can Teach Themselves t... This doesn't make a lot of sense when you consider how backprop works. Layers aren't limited to working independently. This also doesn't make a lot of sense when you consider models are autoregressive.