Nice. The iPads generally measure around ~8% slower than the MacBooks, I guess for cooling reasons. So we should see approximately a 4400 single core Geekbench score for the MacBook series. This is nice.
Keep in mind that a big part of the huge jump in recent chips was that GB6 added support for SME, and to my knowledge, no app uses SME as of yet. GB5 is a better benchmark for all these chips for this reason.
The actual IPC increase and perf/clock of these chips excluding SME specific acceleration is MUCH smaller.
I'm not sure what you're talking about. Any app compiled using LLVM 17 (2023) can use SME directly and any app that uses Apple's Accelerate framework automatically takes advantage of SME since iOS 18/macOS 15 last year.
Benchmarking a processor for "app written by someone who disregards performance" is something you can do, but it's a bit of a pointless exercise; no processor will ever keep up with developers ability to write slow code.
Of course. And these are CPU vector instructions, so the saying "The wider the SIMD, the narrower the audience" applies.
But ultimately with a benchmark like Geekbench, you're trusting them to pick a weighting. Geekbench 6 is not any different in that regard to Geekbench 5 – it's not going to directly reflect every app you run.
I was really just pointing out that the idea that "no" apps use SME is wrong and therefore including it does not invalidate anything – it very well could speed up your apps, depending on what you use.
SME is just the AMX coprocessor that’s been in Apple chips since 2019. SME made it easier to target the AMX. But it’s been in use and available to developers since 2019.
Similar, but not the same. SME is much more powerful than AMX on the pre-M4 cores, and software can target it directly instead of using Apples frameworks. Which means that software is more likely to actually use it (eventually), even if hardly anything does now.
> The point stands that virtually no apps used AMX (either directly or through a framework).
AMX has been present in every M series chip and the A series chips starting with the A13. If you are comparing M series chip scores in Geekbench 6 they are all using it, not just the latest ones.
Any app using Apple's Accelerate framework will take advantage of it.
This isn’t true, I don’t believe GeekBench ever made use of AMX. They do use SME on any Arm-based platform that has it, which up until extremely recently has only been Apple.
I wish we could get something other than Geekbench for these things, since Geekbench seems to be trash. For example, it has the Ryzen 7 7700X with a higher multi-core score than the Epyc 9534 even though they're both Zen4 and the latter has 8 times as many cores and is significantly faster on threaded workloads in real life.
That's what the single thread score is supposed to be for. The multi-thread score is supposed to tell you how the thing performs on the many real workloads that are embarrassingly parallel.
Suppose I'm trying to decide whether to buy a 32-core system with a lower base clock or a 24-core system with a higher base clock. What good is it to tell me that both of them are the same speed as the 8-core system because they have the same boost clock and the "multi-core" benchmark doesn't actually use most of the cores?
The only valid benchmark for that is to use the application you intend to use as a benchmark. Even embarassingly parallel problems can have different characteristics depending on their use of memory and caches and the thermal characteristics of the CPU. Something that uses only L1 cache and registers will probably scale almost linearly in the number of cores, except for thermal influences. Something that uses L2, L3 caches or even main memory will be sublinear.
You're essentially just arguing that all general-purpose benchmarks are worthless because your application could be different.
Suppose I run many different kinds of applications and am just looking for an overall score to provide a general idea of how two machines compare with one another. That's supposed to be the purpose of these benchmarks, isn't it? But this one seems to be unusually useless at distinguishing between various machines with more than a small number of cores.
Your analysis is also incorrect for many of these systems. Each core may have its own L2 cache and each core complex may have its own L3, so systems with more core complexes don't inherently have more contention for caches because they also have more caches. Likewise, systems with more cores often also have more memory bandwidth, so the amount of bandwidth per core isn't inherently less than it is in systems with fewer cores, and in some cases it's actually more, e.g. a HEDT processor may have twice as many cores but four times as many memory channels.
General-purpose benchmarks aren't worthless. They can be used to predict, in very broad strokes, what application performance might be. Especially if you don't really know what the applications would be, or if it is too tedious to use real application benchmarks.
But in your example, deciding between 24 cores with somewhat higher frequency or 32 cores with somewhat lower frequency based on some general-purpose benchmark is essentially pointless. The difference will be small enough that only the real application benchmark can tell you what you need to know. A general purpose benchmark will be no better than a coin toss, because the exact workings of the benchmark, the weightings of it's components into a score and the exact hardware you are running on will have interactions that will determine the decision to a far greater amount. You are right that there could be shared or separate caches, shared or separate memory channels. The benchmark might exercise those, or it might not. It might heat certain parts of the die more than others. It might just be the epitome of embarassingly parallel benchmarks, BogoMIPS, which is a loop executing NOPs. The predictive value of the general purpose benchmark is nil in those cases. The variability from the benchmark maker's choices will always necessarily introduce a bias and therefore a measurement uncertainty. And what you are trying to measure is usually smaller than that uncertainty. Therefore: No better than a coin toss.
You're just back to arguing that general purpose benchmarks are worthless again. Yes, they're not as applicable to the performance of a specific application as testing that application in particular, but you don't always have a specific application in mind. Many systems run a wide variety of different applications.
And a benchmark can then provide a reasonable cross-section of different applications. Or it can yield scores that don't reflect real-world performance differences, implying that it's poorly designed.
I attempted to do this and discovered an irregularity.
Many of the systems claiming to have that CPU were actually VMs assigned random numbers of cores less than all of them. Moreover, VMs can list any CPU they want as long as the underlying hardware supports the same set of instructions, so unknown numbers of them could have been running on different physical hardware, including on systems that e.g. use Zen4c instead of Zen4 since they provide the same set of instructions.
If they're just taking all of those submissions and averaging them to get a combined score it's no wonder the results are nonsense. And VMs can claim to be non-server CPUs too:
The multi-core score listed in the main results page for EPYC 9534 is 15433, but if you look at the individual results, the ones that aren't VMs with fewer than all the cores typically get a multi-core score in the 20k-25k range, e.g.:
What does that have to do with the scores being wrong? As mentioned, virtual machines can claim to be consumer CPUs too, while running on hardware with slower cores than the ones in the claimed CPU.
That doesn't make any sense. Many of the applications are identical, e.g. developer workstations and CI servers are both compiling code, video editing workstations and render farms are both processing video. A lot of the hardware is all but indistinguishable; Epyc and Threadripper have similar core counts and even use the same core complexes.
The only real distinction is between high end systems and low end systems, but that's exactly what a benchmark should be able to usefully compare because people want to know what a higher price tag would buy them.
For >99% of people looking to compile code or render video on an M5 Laptop they are interested in the wall-clock time, running bare metal, assuming all IO is to a fast NVMe SSD, and even a large job will only thermally throttle for a bit then recover.
Most people looking to optimize Epyc compile or render performance care about running inside VMs, all IO to SANs, assuming the is enough work you can yield to other jobs to increase throughput, and ideally near thermal equilibrium.
Will the base core count and mix between perf and efficient cores remain the same? That has lead to different scaling factors for the multicore performance than the single core metrics.
Possibly, at least compared to the previous M4 generation. For the lowest tier M models to this point:
M1 (any): 4P + 4E
M2 (any): 4P + 4E
M3 (any): 4P + 4E
M4 (iPad): 3P + 6E
M4 (Mac): 4P + 6E
M5 (iPad): 3P + 6E (claimed)
M5 (Mac): Unknown
It's worth noting there are often higher tier models that still don't earn the "Pro" moniker. E.g. there is a 4P + 8E variant of the iMac which is still marketed as just having a normal M4.
The die shrinks are less than the marketing numbers would make you believe, but the cores are getting significantly more complex. I think E cores had a 50% cache increase this generation, as an example.
The above summary also excludes the GPU, which seems to have gotten the most attention this generation (~+30%, even more in AI workloads).
I use one for software development and it's great. Sometimes rust builds are slow and I'd love to force that to be faster with hardware (optimizing build time would be a huge undertaking with not-so-great returns), otherwise I'm totally content. I also have an M2 Max with 32GB of RAM that still feels like magic. I've never had computers that felt so fast for so long.
I can't even remember PCs now (been 10+ years on Macs) but heat is still an issue (especially in summer in hot climates) if the thing is going to throttle.
Heat is an issue with Macs too: if it wasn't, you'd have Air chassis with the performance of M4 Max/Ultra.
Yes, they've done some nice things to get the performance and energy efficiency up, but it's not like they've got some magic bullet either. From what I've seen in reviews, Intel is not so far off with things like Ultra 7 258V. If they caught up to TSCM on the process node, they would probably match Apple too.
I've got a laptop of the times when you switched to a Mac. It's warm in winter, which is nice, but not so warm to be a problem in summer. My workloads are mild, Django and Rails. Even the test suites are not CPU bound. Linux, not Windows.
There is no decent option on an alternative operating system. I do not like macOS and its many quirks, especially its extremely gatekept nature, including the system being SURE that what IT wants is the best for me; I understand that this might be an approach that some people prefer, but in my case it's the equivalent of showing a bull a red cloth.
I will say this - and most will not like this - that I'd go out and buy a M* MacBook if they still kept Boot Camp around and let me install Windows 11 ARM on it. I've heard Linux is pretty OK nowadays, but I have some... ideological differences with the staff behind Asahi and it is still a wonky hack that Apple can put their foot down on any day.
Because I'd prefer to run bare metal and, if I recall correctly, macOS still hogs up 50+ GB on a clean install - and there is no GPU acceleration(? - might be wrong here).
I guess the hardware is extremely locked down. That part is a drag. Application sharing limitations (needing to publish to the app store, more or less) still feels wrong after all these years. There's more but those are the ones that bother me with any frequency
Same here. I have an M1 Max with 64GB and the only time I notice a slowdown is doing Rust release builds with `lto = true` and `codegen-units = 1`, which makes complete sense. Otherwise there is _plenty_ of multicore performance for debug builds, webdev, web browsing, etc., often all at once.
This requires you either being very good at predicting your storage needs, your storage not growing over time (no growing photo collection?), and being aware how flash wear affects SSDs and how wear levelling works.
If you fail at these, you can even trash your SSD and need replacing the whole laptop due to it being soldered in.
I dunno, working with M1 daily I struggle with resource contention and slow py/js builds. I'd love something faster when work provides me with updated device.
docker definitely runs faster inside linux running in a VM on macOS. funny how that works given the overhead on running a VM, but it seems running on a linux & ext4fs interfaces give it quite a performance boost.
My m1 16gb pro gets throttled (and manually restarted) every time vs code hits 100gb ram usage when running jest unit tests. Don’t know who to blame but 99% sure the problem is my source code and not the machine.
Considering he's got a 16GB of RAM, this is virtual memory, and most programs don't need to keep it all loaded at all times (if they do, you'll notice with constant paging exception handling).
Sounds like you need to spend some time optimising your build. Faster hardware just makes developers lazy. I'm still on an M1 and it's fine, although I do have 32GB.
In the age of AI it seems wild to blame developers for being “lazy” and needing more resources.
Like, if I were buying a new workstation right now, I’d want to be shelling out $2000 so that I could get something like a Ryzen AI 395+ with 128GB of fast RAM for local AI, or an equivalent Mac Studio.
That’s definitely not because I’m “lazy,” it’s because I can’t run a decent model on a raspberry pi
Not really - I bought an M4 Air to check how my dev ecosystem (.NET) would run on the ARM/Apple silicon and while its usable its noticeably slowing me down in my day to day with Rider. I will be getting the M5 pro because the performance is a bottleneck for me (even incremental builds take a while). Also I regret not getting at least 30gb ram (Amazon only had 24gb configurations) because with docker running I constantly hit the memory pressure thresholds .
Which is not to say that the Air is a bad device, its an amazing laptop (especially for the price, I have not seen a single Windows laptop with this build quality even at 2x price) and the performance is good - that if I was doing something like VSCode and node/frontend only it would be more than enough.
But also people here oversell its capabilities, if you need anything more CPU/Memory intensive PRO is a must, and the "Apple needs less ram because of the fast IO/memory" argument is a myth.
Just everything syphoning down RAM, standard office/desktop stuff like Slack, Chrome, Mail, Calendar, Messages, a few IMs will easily eat up over 12 GB, then add in VSCode, Rider, Docker (which I cap at 2 GB ram) - I am swapping gigabytes when I have multiple tools running.
But even when I kill all processes and just run a build you can see lack of cores slow the build down enough that it is noticeable. Investing into a 48gb ram/pro version will definitely be worth it for the improved experience, I can get by in the meantime by working more on my desktop workstation.
You can control how much RAM gets allocated to the Docker VM and I keep that at 2GB which is not that much. I am not running stuff inside docker locally just using it to boot up stuff like pg/rmq/redis
I bought an M1 Max with 64GB of ram on the gamble it would last awhile because future M series would be evolutionary not revolutionary again. It seems like it's paid off since there's never a time it feels slow. I may end up upgrading to the M5 just to go down in screen size because I travel more than I did when I bought the 16" one.
Have an M1 Pro 32GB which recently started feeling slower. VSCode multiple tabs is a problem. Generally the UI feels less snappy.
I've switched now to a desktop Linux, using an 8C/16T AMD Ryzen 7 9700X with 64GB. it's like night and day. but it is software related. Apple just slows everything down with their animations and UI patterns. Probably to nudge people to acquire faster newer hardware.
The change to Linux is a change in lifestyle, but it comes with a lot of freedom and options.
Same. My M1 Max only shows it age when building a few massive C++ projects, the M4 cuts down compile times to a quarter so I feel like I'm missing out there, but not enough to put down another$4k yet.
Compared to every Intel MBP I went through, where they would show their age after about 2-3 years, and every action/compile required more and more fans and throttling, the M1 is still a magical processor.
I had a 16' m1 pro until July when work had an M3 with 2GB more ram going free and I thought it would be smart to take it, in the past when I had had a three+ year old machine and upgraded everything was super better.. but this time I don't notice any difference at all in almost any application.
The only place I feel it is when I am running a local llm - I do get appreciably more tokens per second.
I claim that the M1 (macbook air) is fast. I also claim it's about half the speed of my similarly priced desktop of the same vintage for slow tasks that I care about.
So I guess we've caught up with the desktop now.
Actually I assume we caught up awhile ago if I used the beefy multi core MX-Ultra variants they released, really just the base model has caught up. On the other hand I could have spent four times as much for twice as many cores on my desktop as well.
Nah not really. When you watch drag racing, they're testing acceleration. One car is always faster. Nobody says one is quicker than the other. Quick (when referring to straight line speed) is reserved for cars like the Miata, which has decent acceleration, but certainly can't accelerate like a muscle car. Nobody really compares top speed all that much because it's damn near impossible to hit top speed in a lot of cars, even on a race track. You will find slower cars comparing top speed though. Like an MG Midget, or an early Honda Civic might be able to hit 100mph, but that's an easily attainable speed. Fast cars are just faster than quick cars.
>>"Fast" refers to top speed. A fast car has a high maximum velocity. It can cover a great distance in a sustained manner once it reaches its peak speed. Think of the Bugatti Chiron or a Koenigsegg, which are famous for their incredibly high top speeds.
>>"Quick" refers to acceleration. A quick car can get from a standstill to a certain speed (often 0 to 60 mph or 0 to 100 km/h) in a very short amount of time. This is about how rapidly the car can change its velocity. Modern electric vehicles, like the Tesla Model S Plaid or the Lucid Air Sapphire, are prime examples of exceptionally quick cars due to the instant torque of their electric motors.
Acceleration is actually a thing. The CPU needs to ramp up cycles fast if it wants to feel snappy. It needs to wind down as soon as no significant workload is there anymore. All needed for good efficiency
M1 has 16bn transistors, M4 has 28bn. Increasing the core count is useful for some applications (particularly GPU cores), but there are still many critical workloads that are gated by single-threaded performance.
Moore's law was never about single threaded performance, it was about transistor count and transistor cost, but people misunderstood it when single threaded performance was increasing exponentially.
You have brought a lot of your own assumptions to that reading. OP asked if doubling or tripling core count counted as keeping up with Moore’s law. I pointed out that in the case of the M series (the topic of the thread), regardless of core count, transistor count did not double or triple.
I think we will land around 4300. On paper N3P is 5-10% more transistors and 5-10% more efficiency. Naively this puts the perf lift range from 10.25% (1.05 * 1.05) to 21% (1.10 * 1.10). 4300 would be around halfway estimate. Gains in iPhone and iPads perf are heavily supported by new cooling tech. When compared to iPhones and iPads, Macbook Pros are much less thermally constrained and M5 Macbooks are keeping M4 design.
TL:DR I expect a smaller MBP M4 to M5 pop compared to iPads M4 vs M5 because the latter are benefiting from new cooling tech.
About 10% faster for single-core and 16% faster for multi-core compared to the M4. The iPad M5 has the same number of cores and the same clock speed as the M4, but has increased the RAM from 8GB to 12GB.
Currently switch between the M2 Pro and the M4 Air, and the Air is noticeable snappier in everyday tasks. The 17' M3 Pro is the faster machine, but I prefer not to lug it around all day, so it gets left home and occasionally used by the wife.
Single thread MacBook progression on Geekbench:
M1: 2350
M2: 2600
M3: 3100
M4: 3850
M5: 4400 (estimated)
https://browser.geekbench.com/mac-benchmarks