Question, apologize if slightly off-topic, it's something I'd like to use this project for: Is there an example of how to train GPT-2 on time series, in particular with covariates?
As my understanding of LLM goes at a basic level it's predicting the next token from previous tokens, which sounds directionally similar to time series (perhaps letting aside periodicity).
These kind of papers often talk the world, but often lack a proper baseline model. They only compare against very simple (naive forecast), or non tuned models. In my experience a gradient boosting model will probably solve 95% of your forecasting problems, and trying to get fancy with a transformer (or even just a simple neural net) is more trouble then it is worth.
As my understanding of LLM goes at a basic level it's predicting the next token from previous tokens, which sounds directionally similar to time series (perhaps letting aside periodicity).