Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Still pondering on using embeddings for classification. Yes, we can group similars with embeddings through clustering, but how do you extract the label for the groups?

What I've come up with is either a) ask an LLM for the common label from samples from the grouped set after indexing (what keyterm best describes the relationship between these documents), or b) determine the label (or keyword) while indexing (by having the LLM find the keyterms ahead of time), then use set overlap on the grouped set's keyterms after to determine a label for the group.



You seem to be looking for a topic model. BERTopic might help: https://maartengr.github.io/BERTopic/index.html#quick-start


This is great...thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: