More

IanOzsvald · 2025-04-26T16:03:13 1745683393

+1 Merlin. I also stop and do a few minutes with Duolingo in the park, then take a breath and just listen to the wind and birdsong.

IanOzsvald · on June 3, 2024

There's a profiler cell magic for Notebooks which helps identify if you run out of VRAM (it says what runs on CPU and GPU). There's an open PR to turn on low-VRAM reporting as a diagnostic. CuDF is impressive, but getting a working setup can be a PITA (and then if you upgrade libraries...). Personality I think it fits in the production pipeline for obvious bottlenecks on well tended configurations, using it in the r&d flow might cost diagnostic time getting and keeping it working (YMMV etc)

IanOzsvald · on April 10, 2024

I'm the co-author of High Performance Python, Micha and I are working on the 3rd ed (for 2025). Lots of bits of the book came from my past conference talks, they're available here (and the public talks will generally be on youtube): https://speakerdeck.com/ianozsvald

Mostly that content has a scientific focus but the obvious thing that carries over to any part of Python is _profiling_ to figure out what's slow. Top tools I'd recommend are:

* https://pypi.org/project/scalene/ combined cpu+memory+gpu profiling

* https://github.com/gaogaotiantian/viztracer get a timeline of execution vs call-stack (great to discover what's happening deep inside pandas)

* my https://pypi.org/project/ipython-memory-usage/ if you're in Jupyter Notebooks (built on https://github.com/pythonprofilers/memory_profiler which sadly is unmaintained)

* https://github.com/pyutils/line_profiler/

randlet · on April 10, 2024

Ooh that ipython extension is nice. Thanks!

IanOzsvald · on April 10, 2024

Thanks :-) I use it for all my talks and finally decided I'd better start sharing it a bit. It really is useful to understand the memory cost of things like Pandas operations

IanOzsvald · on April 8, 2024

@munhitsu gave me a demo at the weekend (I'm on Android and it is iPhone only), it seemed pretty slick and very easy to use, though I confess not something I personally need right now

IanOzsvald · on March 13, 2024

I've just spent the morning uninstalling and reinstalling different versions of Nvidia driver (Linux) to get nvcc back for llama.cpp after Linux Mint did an update - I had CUDA 12.3 and 12.4 (5GB each), in conflict, with no guidance. 550 was the charm, not 535 that was fine in January. This is the third time I'm going this since December. It is painful. I'm not in a hurry to return to my cuDF experiments as I'm pretty sure that'll be broken too (as it has been in the past). I'm the co author of O'Reilly's High Performance Python book and this experience mirrors what I was having with pyCUDA a decade back.

IanOzsvald · on Jan 30, 2024

Really Numba will speed up numpy and some scipy (there's partial API coverage) and math based pure python. I think it is unlikely it'd be used away from math problems. As another commenter mentioned it can be used to accelerate numpy-array based Pandas (but not the newer Arrow based arrays), and again that's for numeric work.

a-dub · on Jan 31, 2024

numpy and scipy are largely just wrappers around native libraries that are already optimized machine code. numba is good for writing loops or doing other things you can pretty much imagine as simple C code.

while it is pretty cool, it's also a bit awkward thinking about machine structures and machine types in high level python. there are some gotchas with respect to the automatic type inference.

IanOzsvald · on Jan 31, 2024

Don't forget that a sequence of numpy operations will likely each allocate their own temporary memory. Numba can often fuse these together, so although the implementation behind numpy is compiled C you end up with fewer memory allocations and less memory pressure, so you still get your results even faster. Numba also offers the OpenMP parallel tools too. I have a nice sequence of simulations in my Higher Performance Python course going from raw python through numpy then to Numba showing how this all comes together. Just try having a function with a=np.fn1(x); b=np.fn2(a); c=np.fn3(b) etc and compile it with @jit and you should get a performance impact. Maybe you can also turn on the OpenMP parallelizer too.

IanOzsvald · on Jan 30, 2024

That's my book :-) Micha and I are working on the 3rd edition right now. Cheers!

mafro · on Feb 2, 2024

Amazing! Any idea when the third edition will be published?

Thanks for your work on this

IanOzsvald · on Jan 1, 2024

How about hyperlinking each title so it can easily be opened in a new tab?

ilpr · on Jan 1, 2024

Will do!

IanOzsvald · on Dec 2, 2023

You may want to look sideways to companies such as hedge funds. They have DNN teams and experiment with LLMs, you may find interesting optimisation opportunities with such teams. Charge according to opportunity that you open up, not electricity saved!

danielhanchen · on Dec 2, 2023

Interesting! Hedge funds - very interesting.

Oh no yep your right on time saved and what opportunities it gives them not just the electricity and capital costs :))

You can now experiment 30 different models instead of 1 - if you have 100 GPUs, we magically made it 3000!

IanOzsvald · on Aug 25, 2023

In the UK I use Redber for beans, I drink decaf (Swiss Water prices) fresh ground in the afternoon and have 2-3 caf cups in the morning. Redber had a wide selection and several roast levels. No caffeine after noon. I use a 2-cup espresso Bialetti stove top, the second smallest.