We put Western Digital’s dreaded SMR Red drive to the test

Western Digital's EFAX Red—an SMR disk—squares off against a Seagate Ironwolf in today's testing.
Enlarge / Western Digital’s EFAX Red—an SMR disk—squares off against a Seagate Ironwolf in today’s testing.
Jim Salter

Western Digital has been receiving a storm of bad press—and even lawsuits—concerning their attempt to sneak SMR disk technology into their “Red” line of NAS disks. To get a better handle on the situation, Ars purchased a Western Digital 4TB Red EFAX model SMR drive and put it to the test ourselves.

Although Western Digital's 4TB SMR disk performed adequately in Servethehome's light duty tests, it performed miserably when they used it to replace a disk in a degraded four-disk RAIDz1 vdev.
Enlarge / Although Western Digital’s 4TB SMR disk performed adequately in Servethehome’s light duty tests, it performed miserably when they used it to replace a disk in a degraded four-disk RAIDz1 vdev.

Recently, the well-known tech enthusiast site Servethehome tested one of the SMR-based 4TB Red disks with ZFS and found it sorely lacking. The disk performed adequately—if underwhelmingly—in generic performance tests. But when Servethehome used it to replace a disk in a degraded RAIDz1 vdev, it required more than nine days to complete the operation—when all competing NAS drives performed the same task in around sixteen hours.

This has rightfully raised questions as to what Western Digital was thinking when it tried to use SMR technology in NAS drives at all, let alone trying to sneak it into the market. Had Western Digital even tested the disks at all? But as valuable as Servethehome’s ZFS tests were, they ignored the most common use case of this class of drive—consumer and small business NAS devices, such as Synology’s DS1819+ or Netgear’s ReadyNAS RN628X00. Those all use Linux kernel RAID (mdraid) to manage their arrays.

Rebuilding a 75% full eight disk RAID6 array

After purchasing a WD 4TB Red EFAX drive like the one that Servethehome tested, we used our existing test rig with eight Seagate Ironwolf drives in the Ars Storage Hot Rod to create a RAID6 array. Our eight Ironwolf disks are 12T a piece, so we partitioned them down to 3500GiB a piece—this made the array small enough that our new WD Red disk could “fit” as a replacement when we failed an Ironwolf out.

When we created the RAID6 array, we used the argument -b none, to keep it from attempting to perform a bitmap scan to do faster rebuilds when using a disk that had previously been in the array. And we formatted it using the ext4 filesystem, with arguments -E lazy_itable_init=0,lazy_journal_init=0 so that background processes wouldn’t contaminate our tests with drive activity that normal users wouldn’t usually contend with.

After formatting the new eight disk, 19TiB array, we dumped 14TiB of data onto it in fourteen subdirectories, each containing 1,024 1GiB files filled with pseudo-random data. This brought the array to a little more than 75 percent used. At this point, we failed one Ironwolf disk out of the array, did a wipefs -a /dev/sdl1 on it to remove the existing RAID headers, then added it back into the now-degraded array. This was our baseline.

Once the Ironwolf had successfully rebuilt into the array, we failed it out again—and this time, we removed it from the system entirely and replaced it with our 4TB Red SMR guinea-pig. First, we fed the entire 4TB Red to the degraded array as a replacement for the missing, partitioned Ironwolf. Then once it had finished rebuilding, we failed it out again, wipefs -a‘d the RAID header from it, and added it back in to rebuild a second time.

This gave us our two test cases—a factory-new Red SMR disk being rebuilt into an array, and a used Red SMR disk with a lot of data on it already being rebuilt into an array. We felt it was important to test both ways, since each case is a common use of NAS disks in the real world. It also seemed likely that an SMR disk full of data might perform worse than a brand-new one, which wouldn’t need to read-modify-write as it dealt with already-used zones.

The SMR EFAX rebuilt into our conventional RAID6 array just fine—even when 75% of its capacity was already filled.
Enlarge / The SMR EFAX rebuilt into our conventional RAID6 array just fine—even when 75% of its capacity was already filled.
Jim Salter

We weren’t surprised that the SMR disk performed adequately in the first test—consumer ire aside, it seemed unlikely Western Digital had sent these disks out the door with no testing whatsoever. We were more surprised that it performed the same way in a used condition as it had when new—the drive’s firmware was able to shuffle data around well enough that it didn’t take a single additional minute to rebuild from a “used” condition as it had when new.

Simple 1MiB random write test

Clearly, the WD Red’s firmware was up to the challenge of handling a conventional RAID rebuild, which amounts to an enormous, very large block sequential write test. The next thing to check on was whether the EFAX would handle a heavy version of the typical day-to-day use case of a consumer NAS well—that is, storing large files.

Once again, at first glance, the WD Red passes muster. In terms of throughput, the Red is only 16.7 percent slower than its non-SMR Ironwolf competition. Even retesting it a second time, when the firmware has a harder job dealing with already-full zones, doesn’t change the picture significantly.

When we drill down a little farther and look at fio‘s latency numbers, things look noticeably worse. The EFAX Red is 68.8 percent slower on average to return from an operation than the Ironwolf—but again, this is “not winning the race” territory, not “you’re going to get sued for fraud” territory. It’s only when we look at peak latency from the 1MiB random write test that we begin to see how bad things can get when you push the Red in unplanned-for directions. Its worst case return is a whopping 1.3 seconds, well over ten times worse than the Ironwolf’s slowest return of 108 milliseconds.

We can extrapolate from this peak latency result that when the Red’s firmware is floundering badly, its throughput may fall below 1MiB/sec for a little while—and this correlates with the ever-changing throughput numbers we saw as we watched the throughput tests running. It also tells us that for a desktop user, someone who wants things to happen when they click buttons and drag things around, the Red can occasionally provide a truly frustrating experience during what should be a very, very easy workload, even for a conventional drive.

https://arstechnica.com/?p=1681322