Serving file content/diff requests from gitea/forgejo is quite expensive computa...

rollcat · 2025-08-25T11:05:22 1756119922

I think at this point every self-hosted forge should block diffs from anonymous users.

Also: Anubis and go-away, but also: some people are on old browsers or underpowered computers.

diggan · 2025-08-25T15:53:13 1756137193

> Serving file content/diff requests from gitea/forgejo is quite expensive computationally

One time, sure. But unauthenticated requests would surely be cached, authenticated ones skip the cache (just like HN works :) ), as most internet-facing websites end up using this pattern.

Sesse__ · 2025-08-25T20:05:25 1756152325

There are _lots_ of objects in a large git repository. E.g., I happen to have a fork of VLC lying around. VLC has 70k+ commits (on that version). Each commit has about 10k files. The typical AI crawler wants, for every commit, to download every file (so 700M objects), every tarball (70k+ .tar.gz files), and the blame layer of every file (700M objects, where blame has to look back on average 35k commits). Plus some more.

Saying “just cache this” is not sustainable. And this is only one repository; the only reasonable way to deal with this is some sort of traffic mitigation, you cannot just deal with the traffic as the happy path.

q3k · 2025-08-25T16:48:07 1756140487

You can't feasibly cache large reposotories' diffs/content-at-version without reimplementing a significant part of git - this stuff is extremely high cardinality and you'd just constantly thrash the cache the moment someone does a BFS/DFS through available links (as these bots tend to do).