What 1T parameter base model have you seen from any of those labs? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		zackangelo 3 months ago \| parent \| context \| favorite \| on: Kimi K2 Thinking, a SOTA open-source trillion-para... What 1T parameter base model have you seen from any of those labs?

riku_iki 3 months ago [–]

its moe, each expert tower can be branched from some smaller model.

jychang 3 months ago | [–]

That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact