Hands-on with the Apple M1—a seriously fast x86 competitor [Updated]

  News
image_pdfimage_print
Apple's new octa-core ARM big/little CPU is putting its high performance x86 competition on notice.
Enlarge / Apple’s new octa-core ARM big/little CPU is putting its high performance x86 competition on notice.

Original story 9:00am EST: There’s a lot of understandable excitement around Apple’s ARM-powered devices right now. And we’ve got traditional reviews of those devices and their ecosystems, for Apple fans and the Apple-curious. This is not one of those reviews—though reviews are coming imminently for some of the new Macs. Instead, we’re going to take a closer look at the raw performance of the new M1 in comparison to more traditional x86 systems.

The M1’s CPU is a 5nm octa-core big/little design, with four performance cores and four efficiency cores. The idea is that user-focused foreground tasks, which demand low latency, will be run on the performance cores—but less latency-sensitive background tasks can run slower and lower on the four less-powerful but less power-consumptive efficiency cores.

In addition to the eight CPU cores, the version of the M1 in the Mac mini has eight GPU cores, with a total of 128 Execution Units. Although it’s extremely difficult to get accurate Apples-to-non-Apples benchmarks on this new architecture, I feel confident in saying that this truly is a world-leading design—you can get faster raw CPU performance, but only on power-is-no-object desktop or server CPUs. Similarly, you can beat the M1’s GPU with high-end Nvidia or Radeon desktop cards—but only at a massive disparity in power, physical size, and heat.

Table of Contents

ARM is coming

ARM architecture generally has a substantial power-efficiency advantage over x86-64—the architecture underlying traditional Windows, Linux, and macOS machines. That power efficiency advantage led ARM to an early and crushing victory in the ultramobile space—phones and tablets—where milliwatts saved matter more than raw performance. From there, ARM began encroaching on the datacenter, and for the same reasons—even though individual ARM processors generally underperformed their x86 equivalents, they got the same amount of work done with lower power and cooling bills necessary.

Desktop and traditional laptop PCs are something of a last bastion for the x86-64 architecture. In these form factors, performance—and the ability to run a familiar operating system and software stack, with zero compromise—has been the most important criterion. But ARM has been coming for the desktop space as well, albeit more slowly—and mostly on the very low end, as we’ve seen in devices such as the Pinebook Pro.

Apple’s new M1 system-on-a-chip (SoC) is decidedly not one of those low-performance, low-cost efforts. The M1 is designed from the ground up to be powerful and rather compromise-free competition for traditional PC architecture.

Geekbench 5.3.0

It’s very frustrating trying to get a direct performance comparison between the M1 and its x86-64 competition—in our device reviews, we normally lean pretty heavily on general-purpose, synthetic benchmark suites that run a wide array of tests against a platform and come up with a simple numeric score. Unfortunately, not all benchmark suites run on macOS, very few run on Apple Silicon, and very little along those lines runs on macOS 11 on Apple Silicon.

Geekbench 5.3.0 is one happy exception to that rule, with a brand-new version running natively on Apple Silicon macOS and in the App Store already. Geekbench is not the entire picture, of course. It can flatten most differences in CPUs, while occasionally and unpredictably magnifying others. And because Metal is the API Apple’s devices and software are optimized for, we don’t normally use its OpenCL-based GPU test at all.

But since we’re looking at a brand-new architecture on a minority platform, before its retail launch, we’re very limited on shiny, pre-packaged benchmark suites. Within Geekbench’s limited world, it’s clear that the M1 is a winner—it beats all comers, whether we’re looking at multithreaded CPU, single-threaded CPU, or OpenCL GPU testing.

Cinebench R23 [Updated]

Updated 4:05pm EST: Cinebench’s new R23 release offers native ARM on macOS support, and we generally prefer it to Geekbench. Although there are criticisms that Cinebench—which uses Maxon’s graphics rendering software—is narrowly focused, we find that it both accentuates differences between CPUs and hews more closely to both real-world expectations and the Passmark general-purpose benchmark than Geekbench does.

Apple’s M1 comes out firmly on top of both the quad-core/octa-thread i7-1185G7 and octa-core/octa-thread Ryzen 7 4700U in unlimited multicore testing. Just for fun, we limited the Ryzen 9 5950X to eight threads only and added it to the mix—even with only eight cores active, the 5950X easily dominates here. But it’s important to remember that only four of the M1’s eight cores are the Firestorm high-performance version… and that the 5950X has a TDP more than three times the entire whole-system power draw of the Mini!

Moving on to single-threaded testing, the M1 runs neck and neck with Intel’s i7-1185G7—but it’s important to look at power consumption again. The i7-1185G7 was tested at its 28W cTDP “up”, not the 15W cTDP we expect most production laptop systems will ship with. Meanwhile, the Mac Mini’s entire at-the-wall power draw—even during multithreaded Cinebench R23—is only 23.5-24.1W. While single-threaded performance doesn’t decrease much on the i7-1185G7 when its cTDP is throttled, it does decrease—and we suspect it will need to be power-limited far more sharply than the M1 in a Macbook Air will.

Next up, I limited both the 5950X (as the world-leading single-threaded x86-64 CPU) and the M1 to four threads only, and I ran the Cinebench R23 test again. Running head-to-head with the M1’s four Firestorm high-performance cores against four of the 5950X’s 16 cores, the 5950X wins with an 8.3% performance increase.

Browser and mobile gaming benchmarks

Original story resumes: Frankly, I wasn’t content with Geekbench. In order to make sure the easy conclusion—that the M1 SoC is a barn-burner, capable of going toe to toe with any and all mobile competitors—was valid, I needed to branch out a little.

In-browser benchmarking is one test that translates well across radically different architectures, since it measures a relatively real task—how well complex operations render within a Web browser. Although benchmarks like Jetstream 2.0 and Speedometer are still synthetic, they model real-world operations that every user expects to work, no matter what the details under the hood are that get them done.

Since the Mac mini’s M1 processor shares its ARM architecture with the A12Z and A14 Bionic found in the latest iPads and iPhones—and Apple, wisely, made the majority of those devices’ apps available in the App Store—that opened up another avenue for comparison. 3DMark’s Slingshot Extreme Unlimited wouldn’t allow me to test the M1 against x86-64 PCs, but it would allow me to gauge the M1’s prowess against the fastest mobile hardware.

Examining the browser benchmarks, the M1-powered mini passed with flying colors. When using Safari on Apple Silicon, the mini absolutely blew the doors off the Ryzen 4700U-powered Acer Swift 3—and even when running x86-64 Google Chrome via Rosetta, it did quite well.

I’d caution readers against trying to draw direct comparisons between these test results and actual browsing experiences—in practice, these are both very fast machines that feel butter-smooth on the Web and elsewhere. The more important point is that the mini and its M1 ARM architecture certainly are not slow.

We can get a further sense of the M1’s prowess by comparing it to the well-lauded iPad Pro 2020, using 3DMark’s Slingshot Extreme mobile gaming test suite. If you want to play your favorite mobile games on the mini, it should clearly be a first-class experience provided the apps translate well—we see a nearly perfect stair-step progression upward from the iPhone 12 Pro to ASUS’ flagship Android gaming phone (yes, that’s a thing), from there to the iPad Pro 2020, and finally to the M1-powered mini firmly on top of the heap.

The only fly in the ointment about mobile gaming on the mini—or its more portable siblings, the M1-powered Macbook Air and Macbook Pro—is that not all iOS apps are available in Big Sur’s App Store yet. Apple uses an automated system to filter out apps that are unsuitable and human curation to confirm some of those filters. Further, developers can choose to opt out of including their apps. I was particularly disappointed to see that Wild Life—3DMark’s newest benchmark app—was missing; that app does allow cross-platform comparisons between PC and mobile and would have been very useful indeed.

Getting down and nerdy—pigz parallel compression

Let’s get this out of the way up front—no, data compression isn’t really a single be-all CPU performance benchmark. With that said, it’s a very direct real-world task that bottlenecks on CPU, and every user experiences it fairly frequently. In order to test data compression speed, I did the following:

  • download the source code for pigz, and compile it on the Mac mini in ARM native mode
  • download the Linux kernel source tree, version 5.10-rc3
  • tar cf the kernel tree, producing a single 972M uncompressed file
  • concatenate the tarball four times, producing a single 3.8GiB uncompressed file
  • cat the resulting fourlinux.tar several times, ensuring it is fully cached in RAM
  • time pigz < fourlinux.tar > /dev/null

When run without additional arguments, pigz spawns one compression thread for each CPU thread it finds online—which means eight processes for both the 4big/4little M1, and the octa-core/octa-thread Ryzen 7 4700u. The macOS app Activity Monitor confirmed CPU utilization of well over 700 percent, confirming that the M1’s four efficiency cores really were in play.

Although this test produced very different results from Geekbench, it does confirm that the M1 is a world-leading processor design. Even when stacked up against AMD’s Ryzen 7 4700u, with its eight full, high-performance cores, the M1 eked out an extremely narrow victory. That victory is well within the margin of error… but it also demonstrates that even on the 4700U’s best day, it can’t beat Apple’s ARM processor in this power configuration.

My desktop workstation, which has an eight-core, 16-thread Ryzen 7 3700X, handily beat both the M1 and the 4700u in an unlimited pigz run—but it does so by leveraging a severe power-consumption discrepancy. The TDP in the Ryzen 7 3700X is 65W—and its actual consumption is significantly higher, when running for long periods at maximum performance for a workload like this.

Firestorm cores only

Having seen how powerful the entire CPU was on a massively parallel workload, the next thing I wanted to know is how fast the four performance cores were by themselves. Running the test again, this time using the -p4 argument to limit pigz to four processes, the M1 came out on top—not just on top of the Ryzen 7 4700u, on top of everything. It beat my Ryzen 7 3700X desktop workstation, and it ran neck-and-neck with the Ryzen 9 5950X on my open-air test rig.

Finally, I ran pigz -p1 to get a second opinion on the M1’s single-core prowess. I didn’t really expect any surprises here, and I didn’t get any—the M1 outpaced every system I had on hand, including the Ryzen 9 5950X test rig.

I unfortunately don’t still have one of Intel’s i7-1185G7 Tiger Lake systems to run new tests against, but I doubt that matters much—the i7-1185G7 should be about neck and neck with the Ryzen 9 5950X for single-threaded performance, and the 5950X is slightly slower on a single thread than the M1.

If Apple’s M1 isn’t the fastest single-thread—and quad-thread—consumer-available processor on the planet, it certainly isn’t missing it by much.

https://arstechnica.com/?p=1723564