This is going on for decades and I wonder what the actual business model for the EU economy is in the future. With all factories soon gone, will Europe rely on agriculture, tourism and some services only? Back to a "developing country" economy?
If news sites opt out of being archived by the Internet Archive, are their archives available anywhere or just lost? Will there be no way to access the headlines of a certain day or the reporting about a certain topic in the past even for research or scientific purposes?
Not sure what you're talking about. The last "forward facing" government was about 50y ago, the last one at least driving meaningful reforms almost 25y ago. To me it seems the more Europe got integrated, the more Germany lost the plot.
Why "abundant cheap energy is a key requirement to survive in today's globalized markets" has not made it into the EU leaderships' mindset is beyond comprehension.
Energy price is just one of many inputs for the viability of industry.
Availability of (educated) labor, wage level, infrastructure, political stability and a ton of other factors are at least as if not more important.
Why should we keep tolerating irreversible damage to planet/climate just to keep costs/prices low? If you can't produce some shit sustainably because that makes it too expensive, then maybe it should not get produced in the first place?
> Kagi
This seems to be true, but more indirectly. From Kagi’s blog [0] which is a follow up to a Kagi blog post from last year [1].
[0]> Google: Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.[^1]
[0]> The current interim approach
(current as of Jan 21, 2026)
[0]> Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.
I’m an avid Kagi user, and it seems like Kagi and some other notable interested parties have _already_ been unable to do get what they want/need with Google’s index.
[0]> The fact that we - and companies like Stanford, Nvidia, Adobe, and the United Nations - have had to rely on third-party vendors is a symptom of the closed ecosystem, not a preference.
Hopefully someone here can clarify for me, or enumerate some of these “third-party vendors” who seem like they will/might/could be directly affected by this.
[0] antibabelic > relevant https://blog.kagi.com/waiting-dawn-search
[1] https://blog.kagi.com/dawn-new-era-search
> [^1]: A note on Google’s existing APIs: Google offers PSE, designed for adding search boxes to websites. It can return web results, but with reduced scope and terms tailored for that narrow use case. More recently, Google offers Grounding with Google Search through Vertex AI, intended for grounding LLM responses. Neither is general-purpose index access. Programmable Search Engine is not designed for building competitive search. Grounding with Google Search is priced at $35 per 1,000 requests - economically unviable for search at scale, and structured as an AI add-on rather than standalone index syndication. These are not the FRAND terms the market needs
I believe they try to indirectly say they are using SerpApi or a similar product that scrapes Google search results to use them. And other big ones use it too so it must be ok...
That must be the reason why they limit the searches you can do in the starter plan. Every SerpApi call costs money.
And I can't prove correlation but they refused to index one of my domains and I think it _might_ be because we had some content on there about how to use SerpAPI
Kagi does not use Google's search index. From their post which made the front page of HN yesterday [1]:
> Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.
They then go on to say that they pay a 3rd party company to scrape Google results (and serve those scraped results to their users). So their search engine is indeed based on unauthorized and uncompensated use of Google's index.
But since they're not using/paying for a supported API but just taking what they want, they indeed are unlikely to be impacted by this API turndown.
Congrats on saying that in the most one-sided way possible. Google makes it literally impossible for them to pay for access to search results to make the product they want (customizable subscription search with no ads), and Google also is the de-facto globally sanctioned crawler because they are the only search engine anyone gives a shit about, and also sites need to be indexed by them to survive. In short, Google owns the river and sells the boats, and the public built a wall around it. Google is in a monopoly position in search.
They have a monopoly on their own search results. There's nothing stopping anyone from making their own (hell, a poster did so in the comments above). God forbid we aren't entitled access to the fruits of their labor; the reason you want it isn't because you can't make it (again, see above). It's because making it good is hard, and you want the good results without yourself putting in the effort to make it
They get results from another provider who has authorized access. Google doesn't provide search results to unauthorized requests as many on tor have experienced.
No. They pay SerpApi to scrape Google. And SerpApi is currently being sued by Google for unauthorized scraping.
Kagi did make comments for years implying that they had a deal with Google for search results, but their latest blog post makes it clear that is not true and was never true.
Google is a monopoly across several broad categories. They're also a taxation enterprise.
Google Search took over as the URL bar for 91% of all web users across all devices.
Since this intercepts trademarks and brand names, Google gets to tax all businesses unfairly.
Tell your legislators in the US and the EU that Google shouldn't be able to sell ads against registered trademarks (+/- some edit distance). They re-engineered the web to be a taxation system for all businesses across all categories.
Searching for Claude -> Ads in first place
Searching for ChatGPT -> Ads in first place
Searching for iPhone -> Ads in first place
This is inexcusable.
Only searches for "ChatGPT versus", "iPhone reviews", or "Nintendo game comparison" should allow ads. And one could argue that the "URL Bar" shouldn't auto suggest these either when a trademark is in the URL bar.
If Google won't play fair, we have to kill 50% of their search revenue for being egregiously evil.
If you own a trademark, Google shouldn't be able to sell ads against you.
--
Google's really bad. Ideally we'd get an antitrust breakup. They're worse than Ma Bell. I wouldn't even split Google into multiple companies by division - I'd force them to be multiple copies of the same exact entity that then have to compete with each other:
Bell Systems -> {BellSouth, Bell Atlantic, Southwestern Bell, ...}
Google -> {GoogleA, GoogleB, GoogleC, ...}
They'd each have cloud, search, browser, and YouTube. But new brand names for new parent companies. That would create all-out war and lead to incredible consumer wins.
Could probably argued that search access is an essential facility[1], though it doesn't appear antitrust law has anywhere near the same sort of enforcement it did in the past.
> If you own a trademark, Google shouldn't be able to sell ads against you.
This is frustrating even from a consumer perspective. Before I ran adblock everywhere, I couldn't stand that typing in a specific company I was looking for would just serve ads from any number of related brands that I wasn't looking for that were competitors.
what stops Kagi from indexing internet and makes them pay some guys to scrape search results from Google? one guy at Marginalia can do it and entire dev team at a PAID search engine can't?
I don't know about others, but we have special rules for Google, Bing, and a few others, rate-limiting them less than some random bot.
The problem is scrapers (mostly AI scrapers from what we can tell). They will pound a site into the ground and not care and they are becoming increasingly good at hiding their tracks. The only reasonable way to deal with them is to rate-limit every IP by default and then lifting some of those restrictions on known, well behaving bots. Now we will lift those restrictions if asked, and frequently look at statistics to lift the restrictions from search engines we might have missed, but it's an up hill battle if you're new and unknown.
As we've seen here on HN on the AI boom, it's not wonderful when a bunch of companies all use bots to scrape the entire web. Many sites only allow Google scrapers in robots.txt and the public will fight you hard if you scrape them without permission. It's just one of those things where it would be better for everyone if search engines could pay for access to the work that's done only once.
> Many sites only allow Google scrapers in robots.txt and the public will fight you hard if you scrape them without permission.
This just lets a monopoly replace the website instead of distributing power and fostering open source. The same monopoly that was already bleeding off the web's utility and taxing it.
Yes, what I think happens is the following: User A's price ceiling is $10, User B's $12. When both reveal their max price early, the item will go to $10.50 ($0.50 increment over A's max price). User A then has plenty of time to notice the item being valued at $10.50 by someone. In many cases users then adjust the value they assign to the item and increase their bid. The result: User B has to pay more than $10.50 they would have paid when sniping the item seconds before auction expiration.
That is the current standard. But it is hard for agents to read efficiently. To access JSON-LD, an agent must download the entire HTML page. This creates a haystack problem where you download 2MB of noise just to find 5KB of data.
Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.
That solves bandwidth. It fails on tokens.
JSON syntax is heavy. Brackets and quotes consume context window.
More importantly, Schema.org is a dictionary of facts. It lacks behavior. It defines what a product is, but not how to sell it.
It has no concept of @SEMANTIC_LOGIC or @BRAND_VOICE. We need a format that carries both data and instructions efficiently. JSON-LD is too verbose and too static for that.
JSON is lean for data exchange between machines.
But in the LLM economy, the currency is tokens, not bytes.
To an LLM tokenizer, every bracket and quote is a distinct cost. In our tests, this 'syntax tax' accounts for up to 30% of the payload.
We chose a line-oriented format to minimize overhead and maximize the context window for actual commerce data.
Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.
That solves the Token Tax. It fails the Bandwidth Tax.
To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM.
You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle.
Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.
reply