Since the entity releasing the model obviously has certain goals aligning/censor...

eru · on March 8, 2024

> Smells a bit Orwellian, right?

No, seems perfectly fine by me. You are already shaping your results by your selection of training data. Eg do you want to train a model that speaks English, or German, or both? Do you want to run your training data past a spam filter first? Do you want to do a character based model, or one of those weird encodings that is popular with LLMs these days?

Doing some other procedures afterwards to make sure your LLM doesn't say embarrassing things is small fries by comparison.

Also it's good practice for trying to get alignment with more important values (like "don't kill all humans") later when models might get powerful enough to be able to kill all humans.

Playing some little games where OpenAI tries to keep you from making their model say embarrassing things, and people keep trying to make it say embarrassing things, is a good low stakes practice ground.

pests · on March 9, 2024

I agree but this entire conversation misses my point that "alignment" originally only meant making the LLM act as you want it.

A GPT that hasn't been aligned does not work how we expect - you give it a prompt, and it will autogenerate until it reaches an end state.

To even make the GPT answer the question in the prompt, and not autocomplete it into nonsense, is an example of alignment.

It took a lot of fine tuning and data curation to get ChatGPT up to its current chat-like interface.

But this is not the only alignment you can do. The original Transformer paper was about machine translation, turning the prompt into the translated text. Once it was done it was done.

We could choose to have the model do something else, say translate the prompt into 5 languages at once instead of one, just as an example. This would be another alignment decision.

There is nothing political or selection bais or anything inherent to the original definition, its only recently "alignment" has morphed into this "align with human morals" concept.

Even in the Andrej Karpathy's build-your-own-gpt YT video, which is highly talked about around here, he uses the phase like this. The end of the video you are left with a GPT, but not a question-and-response model, and he says it would need to be aligned to answer questions like ChatGPT.