Hierarchical storage management (HSM), also known as tiered storage,

History

Hierarchical Storage Manager (HSM, then DFHSM and finally DFSMShsm) was first implemented by IBM on March 31, 1978 for MVS to reduce the cost of data storage, and to simplify the retrieval of data from slower media. The user would not need to know where the data was stored and how to get it back; the computer would retrieve the data automatically. The only difference to the user was the speed at which data was returned. HSM could originally migrate datasets only to disk volumes and virtual volumes on a IBM 3850 Mass Storage Facility, but a later release supported magnetic tape volumes for migration level 2 (ML2).

Later, IBM ported HSM to its AIX operating system, and then to other Unix-like operating systems such as Solaris, HP-UX and Linux.

CSIRO Australia's Division of Computing Research implemented an HSM in its DAD (Drums and Display) operating system with its Document Region in the 1960s, with copies of documents being written to 7-track tape and automatic retrieval upon access to the documents.

HSM was also implemented on the DEC VAX/VMS systems and the Alpha/VMS systems. The first implementation date should be readily determined from the VMS System Implementation Manuals or the VMS Product Description Brochures.

More recently, the development of Serial ATA (SATA) disks has created a significant market for three-stage HSM: files are migrated from high-performance Fibre Channel storage area network devices to somewhat slower but much cheaper SATA disk arrays totaling several terabytes or more, and then eventually from the SATA disks to tape.

Use cases

HSM is often used for deep archival storage of data to be held long term at low cost. Automated tape robots can silo large quantities of data efficiently with low power consumption.

Some HSM software products allow the user to place portions of data files on high-speed disk cache and the rest on tape. This is used in applications that stream video over the internet—the initial portion of a video is delivered immediately from disk while a robot finds, mounts and streams the rest of the file to the end user. Such a system greatly reduces disk cost for large content provision systems.

HSM software is today used also for tiering between hard disk drives and flash memory, with flash memory being over 30 times faster than magnetic disks, but disks being considerably cheaper.

Algorithms

The key factor behind HSM is a data migration policy that controls the file transfers in the system. More precisely, the policy decides which tier a file should be stored in, so that the entire storage system can be well-organized and have a shortest response time to requests. There are several algorithms realizing this process, such as least recently used replacement (LRU), Size-Temperature Replacement(STP), Heuristic Threshold(STEP) etc. In research of recent years, there are also some intelligent policies coming up by using machine learning technologies.

Tiering vs. caching

While tiering solutions and caching may look the same on the surface, the fundamental differences lie in the way the faster storage is utilized and the algorithms used to detect and accelerate frequently accessed data.

Caching operates by making a copy of frequently accessed blocks of data, and storing the copy in the faster storage device and use this copy instead of the original data source on the slower, high capacity backend storage. Every time a storage read occurs, the caching software look to see if a copy of this data already exists on the cache and uses that copy, if available. Otherwise, the data is read from the slower, high capacity storage.

  • IBM DFSMShsm, originally Hierarchical Storage Manager (HSM), 5740-XRB, and later Data Facility Hierarchical Storage Manager Version 2 (DFHSM), 5665-329
  • IBM Tivoli Storage Manager for Space Management (HSM available on UNIX (IBM AIX, HP UX, Solaris) & Linux)
  • IBM Tivoli Storage Manager HSM for Windows formerly OpenStore for File Servers (OS4FS) (HSM available on Microsoft Windows Server)
  • HPSS by HPSS collaboration
  • Infinite Disk, an early PC system (defunct)
  • EMC DiskXtender, formerly Legato DiskXtender, formerly OTG DiskXtender
  • Moonwalk for Windows, NetApp, OES Linux
  • Oracle SAM-QFS (Open source under Opensolaris, then proprietary)
  • Oracle HSM (Proprietary, renamed from SAM-QFS)
  • Versity Storage Manager for Linux, open-core model license
  • Dell Compellent Data Progression
  • Zarafa Archiver (component of ZCP, application specific archiving solution marketed as a 'HSM' solution)
  • HPE Data Management Framework (DMF, formerly SGI Data Migration Facility) for SLES and RHEL
  • Quantum's StorNext
  • Apple Fusion Drive for macOS
  • Microsoft Storage Spaces since version shipped with Windows Server 2012 R2. An older Microsoft product was Remote Storage, included with Windows 2000 and Windows 2003.
  • CloudTier-HSM-SDK: A Windows HSM filter driver, a Windows storage tiering SDK.

Requirements for HSM Implementation

Since distributed computing is highly heterogeneous, organizations need to ensure the presence of certain fundamental storage management components before deploying HSM. These include:

A reliable backup and archiving strategy is extremely important; HSM does not replace backup or archiving. Prior to HSM deployment, organizations are strongly advised to develop a reliable backup and archiving strategy to protect data in a heterogeneous network.

A thorough analysis of network size and data age (usually large networks with redundant volumes of obsolete data) makes them ideal candidates for HSM deployment. Although the volume of obsolete data is difficult to quantify, organizations are recommended to deploy HSM if their data is older than one year, as this will free up a significant amount of network storage resources.

Interaction with end users is crucial; IT specialists must secure the maximum possible support from the end-user community before rolling out HSM across the organization. The HSM file migration component is transparent to the end user. End users often become frustrated when they discover their data has been moved elsewhere without their knowledge. This frustration is compounded when attempting to access a migrated file, which takes significantly longer than usual due to the file recall process. Users should also actively participate in managing and archiving their data using tools provided by the system.

Data Deletion

Data wiping may not fully work on flash media such as solid-state drives and USB flash drives, because these devices can store residual data inaccessible to wiping methods, and data can be recovered from individual flash memory chips inside the device. The baseline is clear: the NIST 800-88 standard still defines data irrecoverability for most auditors, although the DoD 5220.22-M method it replaced is now considered obsolete. The latest standard, IEEE 2883, goes further and covers modern solid-state drives and flash memory. Disk encryption prior to use prevents this issue. Software-based data wiping may also be compromised by malware.

Moving data from a higher-level device (e.g., hard disk drive) to a lower-level device (e.g., optical disc) followed by deletion of the file on the higher-level device is often called file shredding.

See also

  • Active Archive Alliance
  • Archive
  • Backup
  • Hybrid cloud storage
  • Data proliferation
  • Disk storage
  • Information lifecycle management
  • Information repository
  • Magnetic tape data storage
  • Memory hierarchy
  • Storage virtualization
  • Cloud storage gateway

References