Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

I'm probably the worst person to explain this.

Long long ago, I took a parallel programming class in grad school.

It turns out the conventional way to do matrix multiplication results in plenty of cache misses.

However, if you carefully tweak the order of the loops and do certain minor modifications — I forget the details — you could substantially increase the cache hits and make matrix multiplication go noticeably faster on benchmarks.

Some random details that may be relevant:

* When the processor loads a single number M[x][y], it sort of loads in the adjacent numbers as well. You need to take advantage of this.

* Something about row-major/column-major array is an important detail.

What I'm trying to say is, it is possible to indirectly optimize cache hits by careful manual hand tweaking. I don't know if there's a general automagic way to do this though.

This probably wasn't very useful, but I'm just putting it out there. Maybe more knowledgeable folks can explain this better.



I’m guessing you are thinking of cache lines where CPUs will read/write data in 64 byte chunks from memory to/from cache.

As you said, being aware of this lets you optimize away cache misses by controlling the memory access pattern.


There is a data access strategy called cache oblivious algorithms which aim to make it more likely to utilise this property without knowing the actual cache size.

I used that approach once on a batch job that read two multi megabytes files to produce a multigigabyte output file. It gave a massive speed up on at 32-bit intel machine.

https://en.wikipedia.org/wiki/Cache-oblivious_algorithm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: