All SMT does is allow multiple instruction counters on the same superscalar core...

All SMT does is allow multiple instruction counters on the same superscalar core. It increases utilization of all compute units and therefore increases throughput.

Of course it increases latency, since those resources are not fully exclusive to a particular thread anymore.

Whether or not it's a good thing depends on what you care about. You could also argue that a good program would be able to saturate a single superscalar core with a single thread and thus wouldn't benefit from SMT at all, but I think that would be hard to guarantee in practice.