thumb|An example of software that shows the health of the drive and its smart attributes. This 8TB Toshiba Hard Drive appears to be in perfect condition, unformatted, and unused before this demonstration.

Self-Monitoring, Analysis, and Reporting Technology (contrived acronym S.M.A.R.T. or SMART) is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs). Its primary function is to detect and report various indicators of drive reliability, or how long a drive can function while anticipating imminent hardware failures.

When S.M.A.R.T. data indicates a possible imminent drive failure, software running on the host system may notify the user so action can be taken to prevent data loss, and the failing drive can be replaced without any loss of data.

Background

Hard disk and other storage drives are subject to failures (see hard disk drive failure) which can be classified into two basic classes:

  • Predictable failures which result from slow processes such as mechanical wear and gradual degradation of storage surfaces. Monitoring can determine when such failures are becoming more likely.
  • Unpredictable failures which occur without warning due to anything from electronic components becoming defective to a sudden mechanical failure, including failures related to improper handling.

Mechanical failures account for about 60% of all drive failures.

While the eventual failure may be catastrophic, most mechanical failures result from gradual wear and there are usually certain indications that failure is imminent. These may include increased heat output, increased noise level, problems with reading and writing of data, or an increase in the number of damaged disk sectors.

PCTechGuide's page on S.M.A.R.T. (2003) comments that the technology has gone through three phases:

Accuracy

A field study at Google covering over 100,000 consumer-grade drives from December 2005 to August 2006 found correlations between certain S.M.A.R.T. information and annualized failure rates:

  • In the 60 days following the first uncorrectable error on a drive (S.M.A.R.T. attribute 0xC6 or 198) detected as a result of an offline scan, the drive was, on average, 39 times more likely to fail than a similar drive for which no such error occurred.
  • First errors in reallocations, offline reallocations (S.M.A.R.T. attributes 0xC4 and 0x05 or 196 and 5) and probational counts (S.M.A.R.T. attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure.
  • Conversely, little correlation was found for increased temperature and no correlation for usage level. However, the research showed that a large proportion (56%) of the failed drives failed without recording any count in the "four strong S.M.A.R.T. warnings" identified as scan errors, reallocation count, offline reallocation, and probational count.
  • Further, 36% of failed drives did so without recording any S.M.A.R.T. error at all, except the temperature, meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures. Later it was named predictive failure analysis (PFA) technology. It was measuring several key device health parameters and evaluating them within the drive firmware. Communications between the physical unit and the monitoring software were limited to a binary result: namely, either "device is OK" or "drive is likely to fail soon".

Later, another variant, which was named IntelliSafe, was created by computer manufacturer Compaq and disk drive manufacturers Seagate, Quantum, and Conner. The disk drives would measure the disk's "health parameters", and the values would be transferred to the operating system and user-space monitoring software. Each disk drive vendor was free to decide which parameters were to be included for monitoring, and what their thresholds should be. The unification was at the protocol level with the host.

Compaq submitted IntelliSafe to the Small Form Factor (SFF) committee for standardization in early 1995. It was supported by IBM, by Compaq's development partners Seagate, Quantum, and Conner, and by Western Digital, which did not have a failure prediction system at the time. The Committee chose IntelliSafe's approach, as it provided more flexibility. Compaq placed IntelliSafe into the public domain on 12 May 1995. The resulting jointly developed standard was named S.M.A.R.T..

That SFF standard described a communication protocol for an ATA host to use and control monitoring and analysis in a hard disk drive, but did not specify any particular metrics or analysis methods. Later, "S.M.A.R.T." came to be understood (though without any formal specification) to refer to a variety of specific metrics and methods and to apply to protocols unrelated to ATA for communicating the same kinds of things.

Provided information

The technical documentation for S.M.A.R.T. is in the AT Attachment (ATA) standard. First introduced in 1994, the ATA standard has gone through multiple revisions. Some parts of the original S.M.A.R.T. specification by the Small Form Factor (SFF) Committee were added to ATA-3, published in 1997. In 1998 ATA-4 dropped the requirement for drives to maintain an internal attribute table and instead required only for an "OK" or "NOT OK" value to be returned. It has undergone regular revisions, the latest being in 2011. Standardization of similar features on SCSI is more scarce and is not named as such on standards, although vendors and consumers alike do refer to these similar features as S.M.A.R.T. too.

The most basic information that S.M.A.R.T. provides is the S.M.A.R.T. status. It provides only two values: "threshold not exceeded" and "threshold exceeded". Often, these are represented as "drive OK" or "drive fail" respectively. A "threshold exceeded" value is intended to indicate that there is a relatively high probability that the drive will not be able to honor its specification in the future: that is, the drive is "about to fail". The predicted failure may be catastrophic or may be something as subtle as the inability to write to certain sectors, or perhaps slower performance than the manufacturer's declared minimum.

The S.M.A.R.T. status does not necessarily indicate the drive's past or present reliability. If a drive has already failed catastrophically, the S.M.A.R.T. status may be inaccessible. Alternatively, if a drive has experienced problems in the past, but the sensors no longer detect such problems, the S.M.A.R.T. status may, depending on the manufacturer's programming, suggest that the drive is now healthy.

The inability to read some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area, so that the sector can be overwritten.

More detail on the health of the drive may be obtained by examining the S.M.A.R.T. Attributes. S.M.A.R.T. Attributes were included in some drafts of the ATA standard, but were removed before the standard became final. The meaning and interpretation of the attributes varies between manufacturers, and are sometimes considered a trade secret for one manufacturer or another. Attributes are further discussed below.).

A drive that implements S.M.A.R.T. may optionally implement a number of self-test or maintenance routines, and the results of the tests are kept in the self-test log. The self-test routines may be used to detect any unreadable sectors on the disk, so that they may be restored from back-up sources (for example, from other disks in a RAID). This helps to reduce the risk of incurring permanent loss of data.

Standards and implementation

Lack of common interpretation

Many motherboards display a warning message upon boot when a disk drive is approaching failure. Although an industry standard exists among most major hard drive manufacturers,

From a legal perspective, the term "S.M.A.R.T." refers only to a signaling method between internal disk drive electromechanical sensors and the host computer. Because of this the specifications of S.M.A.R.T. are entirely vendor specific and, while many of these attributes have been standardized between drive vendors, others remain vendor-specific. S.M.A.R.T. implementations still differ and in some cases may lack "common" or expected features such as a temperature sensor or only include a few select attributes while still allowing the manufacturer to advertise the product as "S.M.A.R.T. compatible."

In ATA

ATA S.M.A.R.T. attributes

Each drive manufacturer defines a set of attributes, and sets threshold values beyond which attributes should not pass under normal operation.

Each attribute has:

  • 1 byte for the ID (1 through 254).
  • 1 byte for status flags.
  • 1 byte of threshold value, which ranges from 0 to 254.
  • 1 byte of normalized value aka current value, which ranges from 0 to 254 (higher is usually better, but vendors are allowed to vary; the threshold entry stored elsewhere describes which direction is better). The initial normalized value of attributes is 100 but can vary between manufacturer.
  • 8 bytes "vendor-specific".

::However, the full "vendor-specific" attribute is not used as-is. Instead, one of the following occurs:

::* In the 7-byte setup, the first byte of "vendor-specific" is used to store a "worst" normalized value, leaving 7 bytes for vendor data.

::* In the 6-byte setup, the first byte of "vendor-specific" is used to store a "worst" normalized value and the last byte "reserved", leaving 6 bytes.

::* In the 8-byte setup, the normalized byte is added to the attribute while the last byte is reserved.

::The vendor attribute, also commonly called a "raw value", may be displayed as a decimal or hexadecimal number; its meaning is entirely up to the drive manufacturer (but often corresponds to counts or a physical unit, such as degrees Celsius or seconds).

If one or more attributes have the "prefailure" flag, and the "current value" of such prefailure attribute is smaller than or equal to its "threshold value" (unless the "threshold value" is 0), that will be reported as a "drive failure". In addition, a utility software can send SMART RETURN STATUS command to the ATA drive, it may report three status: "drive OK", "drive warning" or "drive failure".

Manufacturers that have implemented at least one S.M.A.R.T. attribute in various products include Samsung, Seagate, IBM (Hitachi), Fujitsu, Maxtor, Toshiba, Intel, sTec, Inc., Western Digital and ExcelStor Technology.

Known ATA S.M.A.R.T. attributes

The following chart lists some S.M.A.R.T. attributes and the typical meaning of their raw values. Normalized values are usually mapped so that higher values are better (exceptions include drive temperature, number of head load/unload cycles), but higher raw attribute values may be better or worse depending on the attribute and manufacturer. For example, the "Reallocated Sectors Count" attribute's normalized value decreases as the count of reallocated sectors increases. In this case, the attribute's raw value will often indicate the actual count of sectors that were reallocated, although vendors are in no way required to adhere to this convention.

As manufacturers do not necessarily agree on precise attribute definitions and measurement units, the following list of attributes is a general guide only.

Drives do not support all attribute codes (sometimes abbreviated as "ID", for "identifier", in tables). Some codes are specific to particular drive types (magnetic platter, flash, SSD). Drives may use different codes for the same parameter, e.g., see codes 193 and 225.

{| class="wikitable" style="margin:auto; text-align:center;"

|+ Legend

|-

! scope="row" | ID

|193<br />0xC1||Attribute code in decimal and<br />hexadecimal notations

|-

! scope="row" rowspan="2" | Ideal

|18px|center|alt=Higher||Higher raw value is better

|-

|18px|center|link=|alt=Lower||Lower raw value is better

|- style="background:#FED;"

! scope="row" | !<br />(Critical)

| 28px|center|link=|alt=Critical

|Denotes a Critical attribute.<br />Specific values may predict drive failure

|}

{| class="wikitable sortable" summary="Overview of known S.M.A.R.T. attributes and their description"

|-

! ID || Attribute name || Ideal || ! || Description

|- style="background:#FED;"

| 01<br />0x01 || style="white-space: nowrap;" | Read Error Rate ||18px|center|link=|alt=Lower

|<br />28px|center|link=|alt=Critical<br />|| (Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.

|-

| 02<br />0x02 || Throughput Performance ||18px|center|alt=Higher

||| Overall (general) throughput performance of a hard disk drive. If the value of this attribute is decreasing there is a high probability that there is a problem with the disk.

|-

| 03<br />0x03 || Spin-Up Time ||18px|center|link=|alt=Lower

||| Average time of spindle spin up (from zero RPM to fully operational [milliseconds]).

|-

| 04<br />0x04 || Start/Stop Count ||

||| A tally of spindle start/stop cycles. The spindle turns on, and hence the count is increased, both when the hard disk is turned on after having before been turned entirely off (disconnected from power source) and when the hard disk returns from having previously been put to sleep mode.

|- style="background:#FED;"

| 05<br />0x05 || Reallocated Sectors Count ||18px|center|link=|alt=Lower

|<br />28px|center|link=|alt=Critical

| Count of reallocated sectors. The raw value represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This value is primarily used as a metric of the life expectancy of the drive; a drive which has had any reallocations at all is significantly more likely to fail in the immediate months.

|-

| 06<br />0x06 || Read Channel Margin ||

||| Margin of a channel while reading data. The function of this attribute is not specified.

|-

| 07<br />0x07 || Seek Error Rate ||

||| (Vendor specific raw value.) Rate of seek errors of the magnetic heads. If there is a partial failure in the mechanical positioning system, then seek errors will arise. Such a failure may be due to numerous factors, such as damage to a servo, or thermal widening of the hard disk. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.

"By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours."

On some pre-2005 drives, this raw value may advance erratically and/or "wrap around" (reset to zero periodically). For some HDDs it might be stored as an unsigned 16-bit integer, which would cause it to wrap around after 65535.

|- style="background:#FED;"

| 10<br />0x0A || Spin Retry Count ||18px|center|alt=Lower

|<br />28px|center|link=|alt=Critical

| Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.

|-

| 11<br />0x0B || Recalibration Retries or Calibration Retry Count ||18px|center| alt=Lower

||| This attribute indicates the count that recalibration was requested (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.

|-

| 12<br />0x0C || Power Cycle Count ||

||| This attribute indicates the count of full hard disk power on/off cycles.

|-

| 13<br />0x0D || style="white-space: nowrap;" | Soft Read Error Rate ||18px|center|link=|alt=Lower

||| Uncorrected read errors reported to the operating system.

|-

| 22<br />0x16 || Current Helium Level ||18px|center|alt=Higher

||| Specific to He8 drives from HGST. This value measures the helium inside of the drive specific to this manufacturer. It is a pre-fail attribute that trips once the drive detects that the internal environment is out of specification.

|-

| 23<br />0x17 || Helium Condition Lower ||<!-- We do not know which way is better: we only know new drives start at 0 -->

||| rowspan=2 | Specific to MG07+ drives from Toshiba. These values measures level of helium inside of the drive specific to this manufacturer. It is a pre-fail attribute that trips once the drive detects that the internal environment is out of specification.

|-

| 24<br />0x18 || Helium Condition Upper ||

||

|-

| 170<br />0xAA || Available Reserved Space ||

||| See attribute E8.

|-

| 175<br />0xAF || Power Loss Protection Failure ||

||| Last test result as microseconds to discharge cap, saturated at its maximum value. Also logs minutes since last test and lifetime number of tests. Raw value contains the following data:

  • Bytes 0-1: Last test result as microseconds to discharge cap, saturates at max value. Test result expected in range 25 <= result <= 5000000, lower indicates specific error code.
  • Bytes 2-3: Minutes since last test, saturates at max value.
  • Bytes 4-5: Lifetime number of tests, not incremented on power cycle, saturates at max value.

Normalized value is set to one on test failure or 11 if the capacitor has been tested in an excessive temperature condition, otherwise 100.

|-

| 177<br />0xB1 || Wear Range Delta ||

||| Delta between most-worn and least-worn Flash blocks. It describes how good/bad the wearleveling of the SSD works on a more technical way.

|-

| 178<br />0xB2 || Used Reserved Block Count<!-- (Used_Rsvd_Blk_Cnt) --> ||

||| "Pre-Fail" attribute used at least in Samsung devices.

|-

| 179<br />0xB3 || Used Reserved Block Count Total<!-- (Used_Rsvd_Blk_Cnt_Tot) --> ||

||| "Pre-Fail" attribute used at least in Samsung devices.

|-

| 180<br />0xB4 || Unused Reserved Block Count Total<!-- (Unused_Rsvd_Blk_Cnt_Tot) --> ||

||| "Pre-Fail" attribute used at least in HP devices.

If the value drops to 0 the device may become read-only to allow the user to retrieve stored data.

|-

| 181<br />0xB5 || Program Fail Count Total or Non-4K Aligned Access Count ||18px|center|link=|alt=Lower

||| (Flash Memory) Total number of Flash program operation failures since the drive was deployed (indicating old age).<p><!--

--> (HDD, Advanced Format) Number of user data accesses (both reads and writes) where LBAs are not 4&nbsp;KiB aligned (LBA % 8 != 0) or where size is not modulus 4&nbsp;KiB (block count != 8), assuming logical block size (LBS)=512 B (indicating bad software configuration).</p>

|-

| 182<br />0xB6 || Erase Fail Count ||

||| "Pre-Fail" Attribute used at least in Samsung devices.

|-

| 183<br />0xB7 || SATA Downshift Error Count or Runtime Bad Block ||18px|center|link=|alt=Lower

||| Western Digital, Samsung or Seagate attribute: Either the number of downshifts of link speed (e.g. from 6Gbit/s to 3Gbit/s) or the total number of data blocks with detected, uncorrectable errors encountered during normal operation. Although degradation of this parameter can be an indicator of drive aging and/or potential electromechanical problems, it does not directly indicate imminent drive failure.

|- style="background:#FED;"

| 184<br />0xB8 || End-to-End error / IOEDC ||18px|center|link=|alt=Lower

|<br />28px|center|link=|alt=Critical

| This attribute is a part of Hewlett-Packard's SMART IV technology, as well as part of other vendors' IO Error Detection and Correction schemas, and it contains a count of parity errors which occur in the data path to the media via the drive's cache RAM.

|-

| 185<br />0xB9 || Head Stability ||

||| Western Digital attribute.

|-

| 186<br />0xBA || Induced Op-Vibration Detection ||

||| Western Digital attribute.

|- style="background:#FED;"

| 187<br />0xBB || Reported Uncorrectable Errors ||18px|center|alt=Lower

|<br />28px|center|link=|alt=Critical