Prometheus does support push. It's just that it's considered such an antipattern that it's been moved into a separate module (the Push Gateway) that you need to run separately.
Pulling has a few technical benefits, though. For one, only the puller needs to know what's being monitored; the thing being monitored can therefore be exceedingly simple, dumb and passive. Statsd is similarly simple in that it's just local UDP broadcast, of course, which leads to the next point:
Another benefit is that it allows better fine-grained control over when metrics gathering is done, and what. Since Prometheus best practices dictate that metrics should be computed at pull time, it means you can fine-tune collection intervals to specific metrics, and this can be done centrally. And since you only pull from what you have, it means there can't be a rogue agent somewhere that's spewing out data (i.e. what a sibling comment calls "authorative sources").
But to understand why pull is a better model, you have to understand Google's/Prometheus's "observer/reactor" mindset towards large-scale computing; it's just easier to scale up with this model. Consider an application that implements some kind of REST API. You want metrics for things like the total number of requests served, which you'll sample now and then. You add an endpoint /metrics running on port 9100. Then you tell Prometheus to scrape (pull from) http://example.com:9100/metrics. So far so good.
The beauty of the model arises when you involve a dynamic orchestration like Kubernetes. Now we're running the app on Kubernetes, which means the app can run on many nodes, across many clusters, at the same time; it will have a lot of different IPs (one IP per instance) that are completely dynamic. Instead of adding a rule to scrape a specific URL, you tell Prometheus to ask Kubernetes for all services and then use that information to figure out the endpoint. This dynamic discovery means that as you take apps up and down, Prometheus will automatically update its list of endpoints and scrape them. Equally importing, Prometheus goes to the source of the data at any given time. The services are already scaled up; there's no corresponding metrics collection to scale up, other than in the internal machinery of Prometheus' scraping system.
In other words, Prometheus is observing the cluster and reacting to changes in it to reconfigure it self. This isn't exactly new, but it's core to Google's/Prometheus's way of thinking about applications and services, which has subseqently coloured the whole Kubernetes culture. Instead of configuring the chess pieces, you let the board inspect the chess pieces and configure itself. You want the individual, lower-level apps to be as mundane as possible, let the behavioural signals flow upstream, and let the higher-level pieces make decisions.
This dovetails nicely with the observational data model you need for monitoring, anyway: First you collect the data, then you check the data, then you report anomalies within the data. For example, if you're measuring some number that can go critically high, you don't make the application issue a warning if it goes above a threshold; rather, you collect the data from the application as a raw number, then perform calculations (e.g. max over the last N mins, sum over the last N mins, total count, etc.) that you compare against the threshold.
In practice, implementing a metrics endpoint is exceedingly simple, and you get used to "just writing another exporter". I've written a lot of exporters, and while this initially struck me as heavyweight and clunky, my mindset is now that an HTTP listener is actually more lightweight than an "imperative" pusher script.
But why is it simpler for Prometheus to have to query Kube to discover all the endpoints in order to collect the data, versus the endpoints just pushing out to Prometheus?
Obviously endpoints already need to know how to contact all sorts of services they depend on. So it's not like you're "saving" anything by not telling them "PrometheusIP = X".
Let's say you want to cleanly shut-down some instances of your endpoint. They are holding connection stats & request counts that you don't want to lose. With push the endpoint can close its connection handler, finish any outstanding requests, push final stats, and then exit. With pull are you supposed to just sit and wait until a Pull happens before the process can exit?
Because it shifts all the complexity to the monitoring system, making the "agents" really, really dumb. There would have to be more to push than just a single IP:
* Many installations run multiple Prometheus servers for redundancy, so to start, it'd have to be multiple IPs.
* They would also need auth credentials.
* They'd need retry/failure logic with backoff to prevent dogpiling.
* Clients would have to be careful to resolve the name, not cache the DNS lookup, in order to always resolve Prometheus to the right IP.
* If Prometheus moves, every pusher has to be updated.
* Since Prometheus wouldn't know about pushers, it wouldn't know if a push has failed. As Prometheus is pull-based, you can detect actual failure, not just absence of data.
There's a lot to be said for Prometheus' principle of baking exporters into individual, completely self-encapsulated programs — as opposed to things like collectd, diamond, Munin, Nagios etc. that collect a lot of stuff into a single, possibly plugin-based, system.
Don't forget, a lot of exporters come with third-party software. You want those programs to have as little config as possible. If I release an open-source app (let's say, a search engine), I can include a /metrics handler, and users who deploy my app can just point their Prometheus at it. It's enticingly simple.
As for graceful shutdown: The default pull frequency is 15 seconds, and you can increase it if you want to avoid losing metrics. Prometheus is designed not to deal with extremely fine-grained metrics; losing a few requests due to a shutdown shouldn't matter in the big picture. But for metrics that are sensitive, it's easy enough to bake them into some stateful store anyway (Redis or etcd, for example), or computing them in real time from stateful data (e.g. SQL). For example, if you have some kind of e-commerce order system, it's better if the exporter produces the numbers by issuing a query against the transaction tables, rather than maintaining RAM counters of dollars and cents.
You would expose a counter with the total request count. Summing those up across all nodes known by Prometheus will give you the total amount of requests currently visible to monitoring. With "rate()" you could calculate the requests/second.
But yes it is possible to miss some requests if a node goes down without Prometheus collecting the latest stats.
But as the parent said, if you need such totals it might be better to store them persistently. Also I do not know a scenario where the total number of requests will trigger an alert.
Thank you for your patient and articulate reponses in this thread, 'lobster_johnson'. You make an excellent case. For me, this nails it:
>"if you're measuring some number that can go critically high, you don't make the application issue a warning if it goes above a threshold; rather, you collect the data from the application as a raw number, then perform calculations (e.g. max over the last N mins, sum over the last N mins, total count, etc.) that you compare against the threshold."
Pulling has a few technical benefits, though. For one, only the puller needs to know what's being monitored; the thing being monitored can therefore be exceedingly simple, dumb and passive. Statsd is similarly simple in that it's just local UDP broadcast, of course, which leads to the next point:
Another benefit is that it allows better fine-grained control over when metrics gathering is done, and what. Since Prometheus best practices dictate that metrics should be computed at pull time, it means you can fine-tune collection intervals to specific metrics, and this can be done centrally. And since you only pull from what you have, it means there can't be a rogue agent somewhere that's spewing out data (i.e. what a sibling comment calls "authorative sources").
But to understand why pull is a better model, you have to understand Google's/Prometheus's "observer/reactor" mindset towards large-scale computing; it's just easier to scale up with this model. Consider an application that implements some kind of REST API. You want metrics for things like the total number of requests served, which you'll sample now and then. You add an endpoint /metrics running on port 9100. Then you tell Prometheus to scrape (pull from) http://example.com:9100/metrics. So far so good.
The beauty of the model arises when you involve a dynamic orchestration like Kubernetes. Now we're running the app on Kubernetes, which means the app can run on many nodes, across many clusters, at the same time; it will have a lot of different IPs (one IP per instance) that are completely dynamic. Instead of adding a rule to scrape a specific URL, you tell Prometheus to ask Kubernetes for all services and then use that information to figure out the endpoint. This dynamic discovery means that as you take apps up and down, Prometheus will automatically update its list of endpoints and scrape them. Equally importing, Prometheus goes to the source of the data at any given time. The services are already scaled up; there's no corresponding metrics collection to scale up, other than in the internal machinery of Prometheus' scraping system.
In other words, Prometheus is observing the cluster and reacting to changes in it to reconfigure it self. This isn't exactly new, but it's core to Google's/Prometheus's way of thinking about applications and services, which has subseqently coloured the whole Kubernetes culture. Instead of configuring the chess pieces, you let the board inspect the chess pieces and configure itself. You want the individual, lower-level apps to be as mundane as possible, let the behavioural signals flow upstream, and let the higher-level pieces make decisions.
This dovetails nicely with the observational data model you need for monitoring, anyway: First you collect the data, then you check the data, then you report anomalies within the data. For example, if you're measuring some number that can go critically high, you don't make the application issue a warning if it goes above a threshold; rather, you collect the data from the application as a raw number, then perform calculations (e.g. max over the last N mins, sum over the last N mins, total count, etc.) that you compare against the threshold.
In practice, implementing a metrics endpoint is exceedingly simple, and you get used to "just writing another exporter". I've written a lot of exporters, and while this initially struck me as heavyweight and clunky, my mindset is now that an HTTP listener is actually more lightweight than an "imperative" pusher script.