SHIELD Error Correction Technology

SHIELD™ Error Correction Technology

Twitter Facebook Google Plus Linked in

Introduction

The relentless pursuit of cost reduction by vendors of NAND flash memory chips is accelerating the adoption of solid state storage, but with smaller fabrication geometries comes the issues of lower reliability and shorter endurance. Each new generation of higher density flash memory contains smaller cells that hold fewer electrons, while the transition from SLC (single-level cell) to MLC (multi-level cell) and now to TLC (three-level cell) is requiring every cell to store more information.

The problem: previous error correction coding (ECC) techniques, such as Bose- Chaudhuri-Hocquenghem (BCH) and Reed-Solomon (RS), are unable to deliver acceptable results on these next-generation high-density flash memory chips when the raw error rates become too high and the ECC code space is limited.

Low-density parity-check (LDPC) technology has long been used in the telecommunications industry to correct transmission errors in various media. LDPC is used, for example, in 10GBase-T Ethernet (10Gb/s over twistedpair cabling), as well as for the 802.11n and 802.11ac Wi-Fi standards as part of the High Throughput PHY specification.

LDPC is also used to correct errors in the magnetic media of hard disk drives (HDDs). Although the algorithms and signal processing used for magnetic media and NAND flash memory are different, Seagate development techniques and tools have been instrumental in optimizing LDPC for solid state storage to provide high-performance error correction for high-density flash memory chips.

SHIELD Technology

SHIELD error correction technology is an advanced implementation of low density parity-check error correction for flash memory. SHIELD technology includes innovations on several fronts, and Seagate has applied for many patents to protect these advances. Together these innovations enable vendors of solid state storage solutions to deliver enterprise-class dependability and data integrity, even when using low-cost, high-density NAND flash memory chips with high error rates.

Three of these innovations—adaptive code rates, a multi-level error correction schema and intelligent handling of noise—combine to make SHIELD technology the state-of-the art in flash memory error correction.

Adaptive Code Rates

The use of adaptive code rates on a per-block basis enables SHIELD technology to maximize capacity at the beginning of life and to ensure reliability at the end of life.

When NAND flash memory is new, errors are rare. As the memory cells get used, program/erase cycles cause the oxide layer between the floating gate from the sub-strate to degrade, reducing its ability to hold a charge. Depending on a number of factors (particularly the techniques used for managing garbage collection, write amplification and wear leveling) different memory blocks begin to experience substantially different error rates. And eventually, over thousands of these cycles, a cell can experience such a high error rate that it becomes unusable.

To deliver a combination of high reliability and peak performance, SHIELD technology implements a variable code rate per block that provides stronger ECC for weak blocks. For strong blocks that require less ECC space, the unnecessary space is repurposed to improve performance. This allocation of ECC space is dynamic, so that the amount of memory overhead consumed begins small and increases over time only as blocks age and generate more errors.

During the beginning of life (BOL) of the flash memory, the unused ECC space is automatically converted to additional overprovisioning (OP) space. Then as the flash memory approaches its end of life (EOL), part of the OP is gradually consumed to enable a much stronger ECC. The effect is to strike a prudent balance between capacity and reliability.

These innovations increase flash memory endurance without requiring an early commitment to a large ECC overhead, which would have the effect of reducing the useful capacity and performance. For this reason, manufacturers can optimize for high-capacity solutions with confidence that SHIELD technology will ensure data integrity throughout the life of the product.

Intelligent Handling of Noise

Flash memory suffers from various noises, including program/erase cycling, retention and read disturb. Such noises may cause an HLDPC decode failure, triggering SLDPC and the associated performance penalty. The SHIELD error recovery policy includes a suite of techniques designed to identify and correct the various noise sources to which NAND flash is subjected in order to prevent additional failures within the same page, block or area of the chip.

To minimize this potential adverse impact on performance, SHIELD technology identifies the source of common noises, and then initiates the most expeditious action needed to correct the error. SHIELD technology also identifies page-level variations as a means to achieve greater accuracy in determining error sources. Depending on the source of the noise, SHIELD technology might resolve the problem with a simple program/erase cycle, or by relocating data to prevent additional failures within the same page, block or area of the flash memory chip.

Multi-level Error Correction Schema

LDPC performs two types of error correction decoding: hard-decision and soft-decision. Hard-decision decoding (HLDPC) employs a single quantization level between two adjacent storage states. As a binary technique, HLDPC can be implemented in simple hardware and, therefore, it performs quickly and provides error correction on par with BCH at the same code rate.

Soft-decision decoding (SLDPC) utilizes analog voltage levels between two adjacent storage states, which are converted or quantized into integers for analysis.
SLDPC has remarkably strong error-correcting capabilities, but it introduces two sources of latency owing to the need to collect additional channel information and to perform more complex decoding.

To minimize this latency, SHIELD technology employs a multi-level retry schema that applies progressively stronger levels of SLDPC decoding only as needed to correct errors. Performance is further enhanced through the use of advanced digital signal processing (DSP) techniques and multi-processor parallelism. Overall, this minimizes the time-to-data that is so critical in today’s time-sensitive datacenter applications.

Conclusion

Initial analysis on 20nm-class flash memory indicates that SHIELD technology, when integrated with other DuraClass technologies on SandForce SF3700 family flash controllers, can improve endurance over the manufacturer’s useful life rating by more than 4x for typical flash memory. For this reason, SHIELD technology will make it possible for system designers to take advantage of future advances in NAND flash memory density without compromising capacity, performance, reliability or endurance.

No storage medium is perfect, making the need to detect and correct the inevitable errors an essential function in both magnetic and solid state storage solutions. The Seagate's flash controller engineering team has a long history of advancing the state-of-the-art in error correction. SHIELD technology is the next such advance in this tradition that will maintain Seagate as the industry leader in smart storage management silicon.

Twitter Facebook Google Plus Linked in

What do you need space for?

Seagate is helping the world's biggest game changers solve their toughest business and technical challenges through data storage innovation.

Hide form

See what we can do for you.

Register to receive the latest Seagate customer case studies, industry insights, trend information and more.

All fields are required

Please fill all fields before submitting the form