Q. Why should I invest more money in a Pro, Business or Enterprise line of Solid State Drives instead of the cheaper consumer SSD’s?
A. In order to answer the question, we need to dive into how SSD drives are built, and how performance and reliability are measured.
TL:DR Professional and Enterprise level SSD drives are more reliable and have better sustained performance than consumer drives. Whether the price difference is worth it to you depends on how you use your computer. SSDs have different performance degradation and failure modes compared to hard drives.
All SSD drives are made with NAND Flash memory chips, but the parts they are built from and their design have a major impact on the performance of the drive and its long term reliability, and the final performance in your machine is limited by the interface.
All drives, SSD drives included, will fail in time. An SSD with a longer expected lifespan is said to have higher endurance. The main point of failure is degeneration of the NAND flash cells, which are worn out a bit every time the cell is written to. As cells die, the SSD controller attempts to work around them with spare cells, but at some point it is unrecoverable and the drive dies.
NOTE: This is why you should turn off automatic and scheduled De-fragmenting, “Optimization” or “Cleanup” for SSDs in your operating system. These needlessly subject the drive to many writes, and reduces its lifespan. Your SSD’s controller has built in TRIM and Garbage Collection routines, let it take care of itself.
How strong are the chips?: One determinant of the reliability of a drive is what NAND flash chips it is made from.
- Single Level Cell SLC,
- 2 Level (2-bit) Multi Level Cell MLC,
- 3 Level Triple Level Cell TLC (sometimes called 3-bit MLC),
- 4-Level Quad Level Cell QLC.
Increasing the bit level increases storage by allowing the cell to contain three or four bits of information. But it also involves having to distinguish between 8 to 16 different stored voltage levels in a single cell, rather than just one voltage level for single bit ‘on’ and ‘off’ states. At the same time, manufacturers are making NAND with smaller process sizes, and stacking cells vertically in the chip. There’s a big difference in reliability of NAND chips, because the denser you make the cells and the tighter and higher you stack them on the chip, the faster they wear out and the harder it is to keep one cell’s information from leaking into the surrounding cells or gradually losing their internal charge and corrupting data with the wrong voltage readings. Understanding TLC NAND
Increasing density is bad for charge leakage. Here’s an analogy: you are a rock and roll drummer, and you have a mansion on 100 acres of property (you are Roger Taylor). You can rock out as loud as you like, and nobody will complain – or even know. Then you move to a detached house in the ‘burbs, and when you practice, you’re going to get occasional complaints from your neighbor. Next you move into an attached row-house, and you get frequent complaints from the neighbors left and right who can’t escape the racket. And finally you move into a multi-story apartment building, and you get eviction notices from all the neighbors above, below, left, right and across the hall.
Types of NAND and their typical write endurance:
- SLC – 90,000 to 100,000 writes per cell typical lifespan
- 2 bit MLC – 10,000 writes per cell typical lifespan
- 3 bit TLC – 3,000 – 5,000 writes per cell typical lifespan
- 4 bit QLC – 1,000 writes per cell typical lifespan
That’s pretty dramatic drop in component reliability as the density goes up and the price goes down. But keep in mind that even the lowest quality cells will last years in typical desktop or notebook use (30 – 60 hours per week, if you turn of that nasty Defrag utility, that is). SSD controllers have very robust error correction code and the ability to map spare cells seamlessly to replace failed ones. The endurance differences are more important for servers and storage devices that are under an enterprise or industrial workload 24/7.
How many spares are there?: Second is the amount of NAND flash on the drive that is set aside as spare cells for swapping in as replacements to burnt out cells. This is called Over-provisioning. Higher endurance SSDs will provide more spare cells. Inexpensive drives will have less over-provisioning.
How good is the management?: Third, the quality of the controller on the SSD comes into play, as the controller is in charge of spreading the reads and writes across the available memory cells, cleaning up fragmented memory, reducing overuse of cells while at the same time managing DRAM or SLC caching to maintain high performance. Enterprise drives will have their controllers programmed to optimize for read intensive use, or write intensive, or balanced performance.
How big is the drive?: Finally you need to consider drive size vs. your workload. A larger capacity SSD will always last longer than a smaller one, given the same workload, and will often be faster as well; the reason is the controller can spread the reads and writes across more cells, leveling the wear on cells, and exploiting more parallelism in memory read and write operations. To maximize lifespan, buy a SSD twice as large as you think you need.
DWPD vs TBW vs Warranty
Time for some math. SSD manufacturers use several different (but related) methods to estimate lifespan of the SSD. First there is the warranty that is offered, typically 3 years or 5 years. Then we have two different measures:
Disk Writes Per Day (DWPD) is the manufacturer’s guaranteed number of times that the whole drive can be written to each day of its life, and
TeraBytes Written (TBW) is the total number of Terabytes of data the manufacturer guarantees can be written to the drive over its lifetime.
These two numbers intersect at the warranty date.
DWPD x drive size in GB x 365 days per year x warranty length in years / 1000 will come out the same as the TBW.
So a 500 GB drive with a 0.3 DWPD rating is rated for 150 GB per day of writing. If that drive has a 5 year warranty, the TBW is 0.3 x 500 GB x 365 days x 5 = 273,750 divided by 1000 = 274 TBW.
From this you can also see that a 0.3 DWPD 1 TB drive would have twice the TBW as the 500 GB in the same 5 years. In the SSD world, larger lasts longer, especially if you can limit how full the drive gets, keeping 25% or more of the drive’s capacity as free space.
You can do the math backwards as well to derive DWPD from TBW (TBW / Drive size in TB / 365 / warranty in years), so you can compare drives that use the competing methods. DWPD is easier to work with because it already incorporates the drive capacity into the figure.
What does it Mean? Drive manufacturers may provide Mean Time To Failure MTTF or MTBF ratings. All else being equal, a drive with a higher MTTF would be better. Almost all drives that quote MTTF or MTBF state between 1 million and 2 million hours. But… nobody in this world has ever run SSD drives for 1.5 million hours to see what the actual failure rate is. These are estimated statistical figures with no direct relation to a typical computer user. If you are a data center administrator, feel free to study the figures.
So what is there to choose from? SSD drives come roughly grouped into typical ranges of DWPD ratings:
The type of flash that the manufacturer chooses has a direct relationship to price. High endurance drives with SLC or 2 bit MLC can be up to four times as expensive per TB compared to a consumer TLC or QLC drive.
Will you still be there tomorrow? One overlooked property of SSDs is the persistence of the data when the drive is not being used. Because SSDs store information as electrical charges in their cells (rather than as the magnetic orientation of tiny particles on a platter, as with hard drives) these charges have a habit of leaking away over time. When the charge diminishes, the cell can represent a different (but wrong) value. A SSD needs to be plugged in and turned on periodically to keep the charges refreshed. Therefore SSDs are not a good candidate for cold storage of data – your backup data on a SSD on a shelf may be corrupted when you return to it. The error rate of charge loss increases with the bit density of the cells, so a SLC NAND cell is most reliable, and 4-Level NAND based drives are most at risk of data loss in extended power down state because they have much narrower thresholds between the voltages that represent different data values. Why SSDs die a sudden death
Interfaces: Before we can start to talk about performance, we have to talk about the interface that connects the computer to the SSD. The performance you will get out of the SSD is largely limited by the interface.
- SATA connected drives are limited to a maximum of 600 MB/s transfer rate by the SATA III 6 Gbps protocol, typical maximum transfer 550 MB/s.
- SATA SSD drives in a M.2 socket have the same 600 MB/s limitations as SATA drives connected to a SATA port. Early M.2 equipped machines only supported SATA SSDs, later models mostly support PCI-e NVME, and may support SATA interchangeably on one socket as well.
- SAS (Serial Attached SCSI) is mostly found in servers and enterprise computers, capable of either 600 MB/s or 1200 MB/s. Usually used in the context of RAID arrays of drives that can improve performance and reliability by spreading the data across multiple physical drives.
- PCI-e Gen 3 NVMe on M.2 socket Capable of up to 3500 MB/sec, depending how many lanes are implemented. 4 lane PCI-e 3.0 (x4) designs are most common, but there are some 2 lane PCI-e 3.0 (x2) economical models with lower bandwidth.
- PCI-e Gen 4 NVMe on M.2 socket PCI-e 4.0 Capable of up to 5000 MB/sec, Currently only supported on latest model AMD chipset motherboards
The key point here is, that no matter how fast the SSD’s chips are, it cannot deliver data faster than the transfer rate of the interface. If you are looking at attaching the SSD externally, you have the additional limitation of the external interface. Check the footnote below about external interface speeds.
NAND performance: As discussed earlier, the differences between SLC, MLC, 3 and 4 level cells affects lifespan, it also affects performance. The more data you pack per cell, the harder it is to read and write it, and the more error correction you have to do; so the longer it takes, which degrades performance. The quality of the NAND chips and the internal design of the SSD also affect performance. As a consumer, you don’t often know the brand or design of the chips in the product, and they may vary within the same drive model, so beyond choosing MLC, TLC or QLC there is little to go on other than taking a deep dive into online review sites.
DRAM or SLC cache design: So when NAND chips get denser, they also get slower to write data. To maintain performance, SSD manufacturers use a caching scheme, where faster memory is used for buffering incoming write requests immediately, and the controller fits these into NAND storage as best as it can between requests. The cache can be RAM Memory (DRAM) which is much faster than NAND (but is volatile so can create issues with losing unwritten cache data if the power is suddenly cut), and/or the cache can be made of SLC NAND or a portion of MLC/TLC NAND that has been programmed to work as SLC (gaining speed at the expense of capacity). Inexpensive SSDs may not have any RAM (DRAM-less) and may have a small amount of NAND allocated to cache.
Falling off a cliff: The catch is that as long as the data you are writing is smaller than the available cache memory, speed will be great. As soon as your write request is larger than the cache can handle, the controller has to slow everything down to the writing speed of the TLC or QLC NAND, and performance drops dramatically. The least expensive SSD drives can be slower to write than a spinning hard drive, once the cache is exceeded (not what you expected from the hype around SSDs!). You can see this when you copy a large number of files from one drive to another. The first 32 GB or so may go by very quickly, and then it will slow down and the rest of the transfer will happen at a relative snail’s pace.
The larger the cache, the less often you will hit the wall of decreased performance in real life. But the larger your file workload, the sooner you will hit the wall. You need to know your workload to make an informed choice.
Note: This is why you have to dig deeper into reviews for SSDs for sustained large-file write performance if you are concerned about performance. Manufacturers will advertise read and write speeds measured with tests that are below the cache limit only, so every SSD looks fast. They almost never advertise the large file write speed.
In general, professional and enterprise SSDs are engineered for consistent performance under heavy workloads. This may mean that the advertised write speeds are actually lower than consumer drives – say 450 MB/s vs. 520 MB/s – but this hides that fact that under heavy loads, the enterprise drive may maintain 400+ MB/s writing while the consumer drive may have dropped to 100 MB/s or less.
Capacity: As mentioned, a larger drive will last longer than a smaller drive. In addition, larger drives usually have larger caches, and the controller can exploit more parallelism between the banks of NAND chips in the drive, which results in faster performance. If you keep more empty space on the drive, that increases the chance of the controller being able to use empty blocks for writing (which is faster than copying, modifying, erasing and writing a block that has data in already). The more free blocks, the less time the drive has to spend on internal maintenance (garbage collection).
Show me the cache: Some modern SSDs also use dynamic emulated SLC caching, where TLC cells are programmed to run in SLC mode – which is much faster. If you have lots of empty space, these drives can allocate more cells to cache, for example 200 GB of cache instead of 24 GB.
Q. Is a faster SSD worth the money?
A. It depends on you. There’s no doubt that more expensive drives can be faster and more reliable. But do you need it? And if you do, what is the extra performance worth?
What are you using the drive for? If your use is fairly light, and you are not working with huge files, media production or databases, then a slower SSD may well be enough for your needs. Keep in mind that the Read performance difference between brands and models is quite small, and the small transfer Write performance is also quite close between models. M.2 PCI-e NVME drives are 3 – 4 times higher bandwidth than SATA drives and are similar in price, so if your hardware gives you the choice, opt for the PCI-e NVME option.
If you are working in a professional environment where each second of waiting time costs money or impedes your productivity, then a faster SSD that sustains speed in large volume writes can be well worth it, as may be an array of SSDs on Thunderbolt or on a PCI-e card if you have high volume work like 4K and higher video editing.
As an opinion, if you are not in the professional / critical use category, then instead of spending top dollar on a pro level drive, get a good quality consumer or business level drive, but spend the extra money to double the capacity above what you were planning.
Q. Pro SSD drives seem more reliable, sure, but they are more expensive. Is the price worth it?
A. The value of the increased reliability comes down to the value of your data and the cost of downtime.
What’s it worth to you? If a SSD fails, it often fails without prior warning. This means that the data you have at risk is all of the data that has been modified since your last backup. You can estimate the cost by looking at the time period between backups, and the cost to recreate the amount of data that would be lost in that period of work (manually re-entering from paper documentation or email trails, and redoing work), plus the cost of losing irretrievable data (for which there is no other record).
Obligatory warning: Whatever drive(s) you use, you must have a robust backup plan for your data, which includes both live backup and off-line archival backup storage. You should also have recovery disks and disk images saved of the boot volume(s) of each of your machines, so you can quickly restore to a base OS and Application configuration.
Down and out: Added to the cost of lost data is the cost of downtime (being out of business on that machine until it is repaired) and the cost of restoration (the labour cost and time to reinstall and restore from backups).
Once you have an estimate of the cost of failure, then you can look at the comparative price of the drives and the estimated difference in lifespan and reliability given your usage.
Price/Performance: As mentioned, the performance difference under light loads isn’t very different between inexpensive drives and professional drives. Almost all SATA SSDs claim Read speeds (best case) over 500 MBps and Write speeds (best case) of 450 – 520 MBps. Because of their much faster interface, PCI-e NVME drives have ranges from 1800 – 2800 MBps; they are so much faster that for casual use, the differences between them are less significant. So for light loads, there isn’t much use in comparing performance.
The money is on the barrel head however when it comes to heavy load performance and endurance. The tricky part is, the correct choice depends on the type of load that you are putting on it. Check websites like https://www.storagereview.com/ for deeper review of drive performance under load.
Here is a comparison of pricing and endurance of 1 TB class drives within some brands (Pricing is in CAD and will change, so these are relative measurements only) You can see the spread of pricing can be more than double. The high relative price of the 2-bit MLC Samsung Pro drive illustrates how much cheaper TLC and QLC components are.
Samsung 860 QVO SATA 1TB $190, QLC 3 year warr. 0.33 DWPD
Samsung 860 EVO SATA 1TB $290, TLC 5 year warr. 0.33 DWPD
Samsung 860 PRO SATA 1TB $499, 2-bit MLC 5 year warr. 0.66 DWPD
Samsung 883 DCT SATA 960GB $368, 3-bit MLC (TLC) 5 year warr. 0.8 DWPD
Kingston A400 SATA, 960GB TLC $210, 3 year warr. 0.28 DWPD
Kingston UV500 PCI-e NVMe 960GB TLC $250, 5 year warr 0.27 DWPD
Kingston A2000 PCI-e NVMe 960GB TLC $280, 5 year warr 0.33 DWPD
Kingston KC600 SATA 1024GB TLC $250, 5 year warr 0.33 DWPD
Kingston KC2000 PCI-e NVMe 1TB TLC $370, 5 year warr. 0.33 DWPD
Kingston DC500M SATA 960GB TLC $430, 5 year warr. 1.30 DWPD
Q. What would you do?
A. We recommend identifying which drives (SSD or HD) in your systems are mission critical – that is, the ones that have the highest cost of failure and the highest cost of downtime. Spend your money on better quality drives for those purposes. Buy higher capacity drives if you can budget for it. Then, put those drives on a regular replacement schedule to replace them with new drives BEFORE they fail. This may be a 2 year or 3 year schedule, where you budget for proactive replacement of the drives with new ones. This has the side effect of always making sure you have larger and faster drives with current technology and under warranty.
The used drives that come out of critical-use service on the replacement schedule, can then be re-purposed to less critical uses.
Footnote: External Interfaces: External drives have to go through a USB, Firewire or Thunderbolt port, and therefore the protocol of the interface imposes a limit on performance. Thunderbolt is the only interface that can rival a motherboard-direct connection.
- Firewire which is no longer used on modern machines, was the fastest option last decade, but Firewire 400 can do 40 MB/s and Firewire 800 tops out at about 80 MBps, these can’t keep up with a SSD or a fast spinning hard drive.
- USB has several varieties, the main problem with USB is that it has built in latency and processing overhead which limits performance.
- USB 2.0 is quite slow for drives, theoretically up to 40 MB/s but effectively closer to 25 MB/s effective transfer rate.
- USB 3.0 SuperSpeed is adequate for drive storage but a bit slow for direct use, 5 Gbps bandwidth delivers up to 300-400 MB/sec.
- USB 3.1 Gen 1 SuperSpeed 5 Gbps bandwidth delivers 300-400 MB/sec
- USB 3.1 Gen 2 Superspeed + 10 Gbps bandwidth delivers up to 1250 MB/sec, 700-800 MB/s typical.
- USB 3.2 Gen 1 SuperSpeed is 5 Gbps bandwidth delivers 300-400 MB/sec
- USB 3.2 Gen 2 SuperSpeed+ is 10 Gbps bandwidth, up to 1,250 MB/sec, 700-800 MB/s typical.
- USB 3.2 Gen 2+2 SuperSpeed+ is 20 Gbps bandwidth delivering up to 2,500 MB/sec, 1600 MB/s typical
- Thunderbolt (1) 10 Gbps bandwidth, up to 1,250 MB/sec, 700-800 MB/s typical
- Thunderbolt 2 20 Gbps bandwidth delivering up to 2,500 MB/s, 1,750 MB/s typical
- Thunderbolt 3 40 Gbps bandwidth delivering up to 5,000 MB/s, 2,800-3,500 MB/s typical