Still pondering on using embeddings for classification. Yes, we can group similars with embeddings through clustering, but how do you extract the label for the groups?
What I've come up with is either a) ask an LLM for the common label from samples from the grouped set after indexing (what keyterm best describes the relationship between these documents), or b) determine the label (or keyword) while indexing (by having the LLM find the keyterms ahead of time), then use set overlap on the grouped set's keyterms after to determine a label for the group.
What I've come up with is either a) ask an LLM for the common label from samples from the grouped set after indexing (what keyterm best describes the relationship between these documents), or b) determine the label (or keyword) while indexing (by having the LLM find the keyterms ahead of time), then use set overlap on the grouped set's keyterms after to determine a label for the group.