Still pondering on using embeddings for classification. Yes, we can group simila... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kordlessagain on Sept 5, 2023 \| parent \| context \| favorite \| on: LLM Python/CLI tool adds support for embeddings Still pondering on using embeddings for classification. Yes, we can group similars with embeddings through clustering, but how do you extract the label for the groups? What I've come up with is either a) ask an LLM for the common label from samples from the grouped set after indexing (what keyterm best describes the relationship between these documents), or b) determine the label (or keyword) while indexing (by having the LLM find the keyterms ahead of time), then use set overlap on the grouped set's keyterms after to determine a label for the group.

Nydhal on Sept 5, 2023 [–]

You seem to be looking for a topic model. BERTopic might help: https://maartengr.github.io/BERTopic/index.html#quick-start

kordlessagain on Sept 9, 2023 | [–]

This is great...thank you!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact