K-means clustering is very well principled actually as an instance of the expectation maximization algorithm with "hard" cluster assignment. Turns out it's just good old maximum likelihood:
There are two issues I had in mind. One is that the link between argmin and the algorithm (k-means in this case) feels too "tied to the algorithm" and less explicit than in other algorithms.
The other is that in practice, you typically want to bring your true optimization objective as close as possible to what the algorithm is optimizing, and what k-means is optimizing for is usually pretty far removed. Even small tweaks (lets say, augmenting data with some sparse labels, or modifying the loss function weight based on some aspect of embedding values) are difficult to do with k-means.
https://alliance.seas.upenn.edu/~cis520/dynamic/2022/wiki/in...