My use cases are mostly for automation, and local-only is a must. I currently us...

My use cases are mostly for automation, and local-only is a must.

I currently use the GPU in my server for n8n and Home Assistant with small-ish tooling models that fit in my 8GB VRAM.

TTFT is pretty poor right now, I get 10+ seconds for the longer inputs from HA, n8n isn't too bad unless I'm asking it to handle a largish input, but that one is less time sensitive as it's running things on schedules rather than when I need output.

Ideally I'd like to get Assistant responses in HA to under about 2s if possible.

Looking also for a new desktop at some point but I don't want to use the same hardware, the inference GPU is in a server that's always on running "infrastructure" (Kubernetes, various pieces of software, NAS functionality, etc), but I've always build desktops from components since I was a wee child when a 1.44MB floppy was an upgrade, so a part of me is reluctant to switch to a mini-PC for that;

I might be convinced to get a Framework Desktop though if it'll do for Steam gaming on Linux knowing that when I eventually need to upgrade it, it could supplement my server rack and be replaced entirely with a new model on the desktop, given there's very little upgrade path than to replace the entire mainboard.

No real interest in coding assistants, and running within my home network is an absolute must, which limits capability to "what's the best hardware I can afford?".