There's a profiler cell magic for Notebooks which helps identify if you run out of VRAM (it says what runs on CPU and GPU). There's an open PR to turn on low-VRAM reporting as a diagnostic.
CuDF is impressive, but getting a working setup can be a PITA (and then if you upgrade libraries...).
Personality I think it fits in the production pipeline for obvious bottlenecks on well tended configurations, using it in the r&d flow might cost diagnostic time getting and keeping it working (YMMV etc)
I'm the co-author of High Performance Python, Micha and I are working on the 3rd ed (for 2025). Lots of bits of the book came from my past conference talks, they're available here (and the public talks will generally be on youtube): https://speakerdeck.com/ianozsvald
Mostly that content has a scientific focus but the obvious thing that carries over to any part of Python is _profiling_ to figure out what's slow. Top tools I'd recommend are:
Thanks :-) I use it for all my talks and finally decided I'd better start sharing it a bit. It really is useful to understand the memory cost of things like Pandas operations
@munhitsu gave me a demo at the weekend (I'm on Android and it is iPhone only), it seemed pretty slick and very easy to use, though I confess not something I personally need right now
I've just spent the morning uninstalling and reinstalling different versions of Nvidia driver (Linux) to get nvcc back for llama.cpp after Linux Mint did an update - I had CUDA 12.3 and 12.4 (5GB each), in conflict, with no guidance. 550 was the charm, not 535 that was fine in January.
This is the third time I'm going this since December.
It is painful.
I'm not in a hurry to return to my cuDF experiments as I'm pretty sure that'll be broken too (as it has been in the past).
I'm the co author of O'Reilly's High Performance Python book and this experience mirrors what I was having with pyCUDA a decade back.
Really Numba will speed up numpy and some scipy (there's partial API coverage) and math based pure python. I think it is unlikely it'd be used away from math problems.
As another commenter mentioned it can be used to accelerate numpy-array based Pandas (but not the newer Arrow based arrays), and again that's for numeric work.
numpy and scipy are largely just wrappers around native libraries that are already optimized machine code. numba is good for writing loops or doing other things you can pretty much imagine as simple C code.
while it is pretty cool, it's also a bit awkward thinking about machine structures and machine types in high level python. there are some gotchas with respect to the automatic type inference.
Don't forget that a sequence of numpy operations will likely each allocate their own temporary memory. Numba can often fuse these together, so although the implementation behind numpy is compiled C you end up with fewer memory allocations and less memory pressure, so you still get your results even faster. Numba also offers the OpenMP parallel tools too.
I have a nice sequence of simulations in my Higher Performance Python course going from raw python through numpy then to Numba showing how this all comes together.
Just try having a function with a=np.fn1(x); b=np.fn2(a); c=np.fn3(b) etc and compile it with @jit and you should get a performance impact. Maybe you can also turn on the OpenMP parallelizer too.
You may want to look sideways to companies such as hedge funds. They have DNN teams and experiment with LLMs, you may find interesting optimisation opportunities with such teams. Charge according to opportunity that you open up, not electricity saved!
In the UK I use Redber for beans, I drink decaf (Swiss Water prices) fresh ground in the afternoon and have 2-3 caf cups in the morning. Redber had a wide selection and several roast levels.
No caffeine after noon. I use a 2-cup espresso Bialetti stove top, the second smallest.