If there are complex equations involved, it absolutely is faster. You can also create intermediate LUTs, so the tables are small and fit in cache and then do interpolation on the fly.
Yeah, isn’t hitting memory (especially if it can’t fit in L1-2 cache) one of the biggest sources of latency? Especially that on modern CPUs it is almost impossible to max out the arithmetic units, outside of microbenchmarks?
This has been slower for most things that raw computation for well over a decade (probably more like two).