Yeah but the other side of the coin is that they only explain the very basic concepts that are already settled for several years, not any of these "latest trends"
Anything that is not settled for several years, like papers published last year or so. Like RingAttention, quantization/pruning, rotary embedding, distillation, RLHF, L2 regularization, multimodal, MoE etc.