Related question: are there there any algorithms / optimizations for probabilist...

srean · on Nov 17, 2024

There are a class of probabilistic models that have exactly this property -- exponential family models. While pedagogic examples of such models tend to be very simple, they need not be. A huge class of graphical models fall in this class and can be very flexible. The underlying statistical model of a Kalman filter is in this class.

These models have what are called sufficient statistics, that can be computed on data, s = f(D) where s is the sufficient statistics, D is the past data. The clincher is that there is a very helpful group theoretic property:

s = f(D ∪ d) = g( s', d) where s' = f(D). D is past data, d is new data, D ∪ d is the full complete data.

This is very useful because you don't have to carry the old data D around.

This machinery is particularly useful when

(i) s is in some sense smaller than D, for example, when s is in some small finite dimension.

(ii) the functions f and g are easy to compute

(iii) the relation between s and the parameters, or equivalently, the weights ⊝ of the model is easy to compute.

Even when models do not possess this property, as long models are differentiable one can do a local approximate update using the gradient of the parameters with respect to the data.

⊝_new = ⊝_old + ∇M * d.

(∇M being the gradient of the parameters with respect to data, also called score)

With exponential family models updates can be exact, rather than approximate.

This machinery applies both to Bayesian as well as more classical statistical models.

There are nuances also, where you can drop some of the effects of old data under the presumption that they no longer represent the changed model.

abeppu · on Nov 17, 2024

The algorithmic family that in general seems called for is "sequential monte carlo" or "particle filtering". While often this is presented for models and problems where the latent dimension is small and the problem is about how something evolves over time, the "sequential" part of it can just be about the order in which data are received (I think this traces to "A Sequential Particle Filter Method for Static Models", N Chopin, 2002).

On the PPL side, I think SMC has often been a secondary target, but there has been good work on getting good, efficient inference be more turn-key. I think this has often focused on being able to automatically provide better/adaptive proposal distributions. For example, "SMCP3: Sequential Monte Carlo with Probabilistic Program Proposals" by Lew et al 2023 (Stuart Russell is among the contributors).

tfehring · on Nov 17, 2024

One way I've seen this done in practice is to construct an offline model that produces an initial set of posterior samples, then construct a second, online model that takes posterior samples and new observations as input and constructs a new posterior. This probably wouldn't make sense computationally in a high-frequency streaming context, but (micro)batching works fine.

I've seen lots of other approaches proposed for this over the years, here's a recent Stan forum thread with some links: https://discourse.mc-stan.org/t/updating-model-based-on-new-...

Onavo · on Nov 17, 2024

Update the model with the new data and recompute it (hopefully the unchanged parts are cached). Most probabilistic programming models are not black boxes like neural nets.