I have to say these seem like hobbyist-level products. For example https://coral.ai/products/m2-accelerator-dual-edgetpu can do 8 TOPS, but a 5-year-old RTX 2060 gets you 50 TOPS. A newer H100 gets you 3958 TOPS.
Nobody's going to buy 500 of those chips and stick them in 500 M.2 slots to match the performance of a single H100.
I would call them less hobbyist products, and more compute for IOT/edge devices. They aren't made for a datacenter and aren't trying to compete with an H100.
Yes, it has one sixth the performance of a RTX 2060, but it has one-five-hundredth the volume. For a specific siloed application, 8 TOPS is plenty. Think image processing, etc.
There are plenty of production use cases where that makes sense and an H100 does not.
At this point, these chips aren't anywhere close to the frontier of capability for embedded accelerators, and certainly don't merit an entire M.2 slot in a serious design.
At this point, nobody sells modules for this, and I doubt many coral chips still sell. The current slot for an ML accelerator at about 10 TOPS is as a peripheral on an SoC. Most serious SoCs have one.
In other words, the reason you didn't find a commercial competitor to these things is that the competitor is (nearly) free.
If the company owns both frontier models and chip design and they see the future moat is in inference why would they offer much more than what you get on Google Cloud? Is not as if they're gonna start competing with Nvidia in hardware anyway, this is a very specific hardware design for a very specific problem.