nvtop or nvidia-smi gives you a good macro overview but I personally have found ...

pama · on March 13, 2024

I agree that utilization by nvidia-smi is a poor proxy for performance. FWIW, I’ve found that for the same architecture the power consumption reported in nvtop very often correlates super nicely with the training performance and the peak performance is always at peak power consumption. Agreed on your advice for getting to tune your architecture details, but once that’s fixed and you have simple things to debug like memory usage, batch size, dataloading bottlenecks the raw power metric is typically a quick proxy. I find the temperature is a second useful macro metric that; you want to be at max power draw and max allowed temp at all times but not exceed the temperature where you throttle.

ipsum2 · on March 13, 2024

I've been going off of power draw in nvidia-smi as a proxy of util, doesn't require additional setup or code changes.

KeplerBoy · on March 13, 2024

That's hard to argue with. Of course power draw is a direct measure of hardware utilization, but it doesn't translate very well to a measure of GPU Code efficiency.

Often you can squeeze out another order of magnitude of performance by rewriting the kernel and the power draw will always stay capped at whatever the maximum is. I'd say GPU power consumption is interesting if you're CPU bound and struggling to feed the GPU enough data and/or tasks.

refibrillator · on March 13, 2024

FLOPs utilization is arguably the industry standard metric for efficiency right now and it should be a good first approximation of how much performance is left on the table.

But if you mean the reported utilization in nvtop is misleading I completely agree (as someone who uses it daily).

I’ve been meaning to dig into the source/docs to see what’s going on. The power usage seems to be a more reliable indicator of actual hardware utilization, at least on nvidia gear.

VHRanger · on March 13, 2024

> FLOPs utilization is arguably the industry standard metric for efficiency right now

I'd argue GB/s memory bandwidth is more worried about at the moment.

samstave · on March 13, 2024

If you install Docker Desktop with WSL2 checked, it automatically lets you run Nvidia-SMI in your WSL ubuntu environ on Windows:

https://i.imgur.com/C24EV5U.png

then sudo apt install nvtop

https://i.imgur.com/SOoCdvR.png

EDIT:

Thanks, Some people were having random problems installing WSL on their systems and I found this was the easiest solution (but based on their card models, they appeared to have much older machines.

acka · on March 13, 2024

There is no need to install Docker Desktop just to run nvidia-smi in WSL; the Windows directory containing the nvidia-smi binary is mounted inside a WSL instance and added to PATH automatically by WSL on instance startup.

As an aside: there is no need to install Docker Desktop just to use Docker containers in WSL either, unless you want a Windows GUI to manage your containers. Just follow the official documentation for installing Docker in your Linux distro of choice, or simply run `sudo apt install docker.io` in the default WSL Ubuntu distro. Docker will work just fine with an up-to-date WSL.

8A51C · on March 13, 2024

Further aside, it's possible to have both Docker Desktop and the normal linux Docker.io installed on WSL. They work in isolation, the easy way to know which is active is to check if Docker Desktop is running or not. I wouldn't recommend this set up...