thumb|[[Internet Archive book scanner]]

thumb|Digital camera imaging setup at the [[New Zealand Arthropod Collection used for extended depth of focus photography of specimens.]]

Digitization is the process of converting information into a digital format, i.e., a format that can be read by computers. The result is the representation of an object, image, sound, document, or signal (usually an analog signal) obtained by generating a series of numbers that describe a discrete set of points or samples. The result of this conversion is called digital representation or, more specifically, a digital image for the object and digital form for the signal. In contemporary practice, the digitized data is expressed as binary numbers, thereby enabling processing by digital computers and other operations. However, the fundamental process of digitizing entails "the conversion of analog source material into a numerical format"; the decimal or any other number system can be used instead.

Digitization is of crucial importance to data processing, storage, and transmission because it "allows information of all kinds in all formats to be carried with the same efficiency and also intermingled." Though analog data is typically more stable, digital data has the potential to be more easily shared and accessed and, in theory, can be propagated indefinitely without generation loss, provided it is migrated to new, stable formats as needed. This potential has led to institutional digitization projects designed to improve access and the rapid growth of the digital preservation field.

Sometimes digitization and digital preservation are mistaken for the same thing. They are different, but digitization is often a vital first step in digital preservation. Libraries, archives, museums, and other memory institutions digitize items to preserve fragile materials and create more access points for patrons. Doing this creates challenges for information professionals and solutions can be as varied as the institutions that implement them. Some analog materials, such as audio and video tapes, are nearing the end of their life cycle, and it is important to digitize them before equipment obsolescence and media deterioration makes the data irretrievable.

There are challenges and implications surrounding digitization including time, cost, cultural history concerns, and creating an equitable platform for historically marginalized voices. Many digitizing institutions develop their own solutions to these challenges. Although e-books have undermined the sales of their printed counterparts, a study from 2017 indicated that the two cater to different audiences and use-cases. In a study of over 1400 university students it was found that physical literature is more apt for intense studies while e-books provide a superior experience for leisurely reading.

Process

The term digitization is often used when diverse forms of information, such as an object, text, sound, image, or voice, are converted into a single binary code. The core of the process is the compromise between the capturing device and the player device so that the rendered result represents the original source with the most possible fidelity, and the advantage of digitization is the speed and accuracy in which this form of information can be transmitted with no degradation compared with analog information.

Digital information exists as one of two digits, either 0 or 1. These are known as bits (a contraction of binary digits) and the sequences of 0s and 1s that constitute information are called bytes.

Analog signals are continuously variable, both in the number of possible values of the signal at a given time, as well as in the number of points in the signal in a given period of time. However, digital signals are discrete in both of those respects – generally a finite sequence of integers – therefore a digitization can, in practical terms, only ever be an approximation of the signal it represents.

Digitization occurs in two parts:

;Discretization: The reading of an analog signal A, and, at regular time intervals (frequency), sampling the value of the signal at the point. Each such reading is called a sample and may be considered to have infinite precision at this stage;

;Quantization: Samples are rounded to a fixed set of numbers (such as integers), a process known as quantization.

In general, these can occur at the same time, though they are conceptually distinct.

A series of digital integers can be transformed into an analog output that approximates the original analog signal. Such a transformation is called a digital-to-analog conversion. The sampling rate and the number of bits used to represent the integers combine to determine how close such an approximation to the analog signal a digitization will be.

Examples

thumb|Digitization of the first number of Estonian popular science magazine [[Horisont published in January 1967]]

The term is used to describe, for example, the scanning of analog sources (such as printed photos or taped videos) into computers for editing, 3D scanning that creates 3D modeling of an object's surface, and audio (where sampling rate is often measured in kilohertz) and texture map transformations. In this last case, as in normal photos, the sampling rate refers to the resolution of the image, often measured in pixels per inch.

Digitizing is the primary way of storing images in a form suitable for transmission and computer processing, whether scanned from two-dimensional analog originals or captured using an image sensor-equipped device such as a digital camera, tomographical instrument such as a CAT scanner, or acquiring precise dimensions from a real-world object, such as a car, using a 3D scanning device.

Digitizing is central to making digital representations of geographical features, using raster or vector images, in a geographic information system, i.e., the creation of electronic maps, either from various geographical and satellite imaging (raster) or by digitizing traditional paper maps or graphs (vector).

"Digitization" is also used to describe the process of populating databases with files or data. While this usage is technically inaccurate, it originates with the previously proper use of the term to describe that part of the process involving digitization of analog sources, such as printed pictures and brochures, before uploading to target databases.

History

  • 1957: The Standards Electronic Automatic Computer (SEAC) was invented. That same year, Russell Kirsch used a rotating drum scanner and photomultiplier connected to SEAC to create the first digital image (176x176 pixels) from a photo of his infant son. This image was stored in SEAC memory via a staticizer and viewed via a cathode ray oscilloscope.

The process of converting analog to digital consists of two parts: sampling and quantizing. Sampling measures wave amplitudes at regular intervals, splits them along the vertical axis, and assigns them a numerical value, while quantizing looks for measurements that are between binary values and rounds them up or down.

Nearly all recorded music has been digitized, and about 12 percent of the 500,000+ movies listed on the Internet Movie Database are digitized and were released on DVD.

Digitization of home movies, slides, and photographs is a popular method of preserving and sharing personal multimedia. Slides and photographs may be scanned quickly using an image scanner, but analog video requires a video tape player to be connected to a computer while the item plays in real time. Slides can be digitized quicker with a slide scanner such as the Nikon Coolscan 5000ED.

Another example of digitization is the VisualAudio process developed by the Swiss Fonoteca Nazionale in Lugano, by scanning a high resolution photograph of a record, they are able to extract and reconstruct the sound from the processed image.

Digitization of analog tapes before they degrade, or after damage has already occurred, can rescue the only copies of local and traditional cultural music for future generations to study and enjoy.

Analog texts to digital

alt=Image of a rare book in a book scanner where it will be digitized.|thumb|Book scanner in the digitization lab at the University of Liège, Belgium

Academic and public libraries, foundations, and private companies like Google are scanning older print books and applying optical character recognition (OCR) technologies so they can be keyword searched, but as of 2006, only about 1 in 20 texts had been digitized. Librarians and archivists are working to increase this statistic and in 2019 began digitizing 480,000 books published between 1923 and 1964 that had entered the public domain.

Unpublished manuscripts and other rare papers and documents housed in special collections are being digitized by libraries and archives, but backlogs often slow this process and keep materials with enduring historical and research value hidden from most users (see digital libraries). Digitization has not completely replaced other archival imaging options, such as microfilming which is still used by institutions such as the National Archives and Records Administration (NARA) to provide preservation and access to these resources.

While digital versions of analog texts can potentially be accessed from anywhere in the world, they are not as stable as most print materials or manuscripts and are unlikely to be accessible decades from now without further preservation efforts, while many books manuscripts and scrolls have already been around for centuries. Digitization can provide a means of preserving the content of the materials by creating an accessible facsimile of the object in order to put less strain on already fragile originals. For sounds, digitization of legacy analog recordings is essential insurance against technological obsolescence. A fundamental aspect of planning digitization projects is to ensure that the digital files themselves are preserved and remain accessible; the term "digital preservation," in its most basic sense, refers to an array of activities undertaken to maintain access to digital materials over time.

The prevalent Brittle Books issue facing libraries across the world is being addressed with a digital solution for long term book preservation. Since the mid-1800s, books were printed on wood-pulp paper, which turns acidic as it decays. Deterioration may advance to a point where a book is completely unusable. In theory, if these widely circulated titles are not treated with de-acidification processes, the materials upon those acid pages will be lost. As digital technology evolves, it is increasingly preferred as a method of preserving these materials, mainly because it can provide easier access points and significantly reduce the need for physical storage space.

Cambridge University Library is working on the Cambridge Digital Library, which will initially contain digitised versions of many of its most important works relating to science and religion. These include examples such as Isaac Newton's personally annotated first edition of his Philosophiæ Naturalis Principia Mathematica as well as college notebooks and other papers, and some Islamic manuscripts such as a Quran from Tipu Sahib's library.

Google, Inc. has taken steps towards attempting to digitize every title with "Google Book Search". While some academic libraries have been contracted by the service, issues of copyright law violations threaten to derail the project. However, it does provide – at the very least – an online consortium for libraries to exchange information and for researchers to search for titles as well as review the materials.

Digitization versus digital preservation

Digitizing something is not the same as digitally preserving it. An example would be scanning a photograph and having the original piece in a photo album and a digital copy saved to a computer. This is essentially the first step in digital preservation which is to maintain the digital copy over a long period of time and making sure it remains authentic and accessible. In contrast, digitization only applies exclusively to analog materials. Born-digital materials present a unique challenge to digital preservation not only due to technological obsolescence but also because of the inherently unstable nature of digital storage and maintenance.

The Library of Congress provides numerous resources and tips for individuals looking to practice digitization and digital preservation for their personal collections.

Digital reformatting

Digital reformatting is the process of converting analog materials into a digital format as a surrogate of the original. The digital surrogates perform a preservation function by reducing or eliminating the use of the original. Digital reformatting is guided by established best practices to ensure that materials are being converted at the highest quality.

Digital reformatting at the Library of Congress

The Library of Congress has been actively reformatting materials for its American Memory project and developed best standards and practices pertaining to book handling during the digitization process, scanning resolutions, and preferred file formats. Some of these standards are:

  • The use of ISO 16067-1 and ISO 16067-2 standards for resolution requirements.
  • Recommended 400 ppi resolution for OCR'ed printed text.
  • The use of 24-bit color when color is an important attribute of a document.
  • The use of the scanning device's maximum resolution for digitally reproducing photographs
  • TIFF as the standard file format.
  • Attachment of descriptive, structural, and technical metadata to all digitized documents.

A list of archival standards for digital preservation can be found on the ARL website.

The Library of Congress has constituted a Preservation Digital Reformatting Program. The Three main components of the program include:

  • Selection Criteria for digital reformatting
  • Digital reformatting principles and specifications
  • Life cycle management of LC digital data

Audio digitization and reformatting

Audio media offers a rich source of historic ethnographic information, with the earliest forms of recorded sound dating back to 1890. According to the International Association of Sound and Audiovisual Archives (IASA), these sources of audio data, as well as the aging technologies used to play them back, are in imminent danger of permanent loss due to degradation and obsolescence. These primary sources are called "carriers" and exist in a variety of formats, including wax cylinders, magnetic tape, and flat discs of grooved media, among others. Some formats are susceptible to more severe, or quicker, degradation than others. For instance, lacquer discs suffer from delamination. Analog tape may deteriorate due to sticky shed syndrome.

thumb|1/4" analog tape being played back on a Studer A810 tape machine for digitization at Smithsonian Folkways Recordings|alt=

Archival workflow and file standardization have been developed to minimize loss of information from the original carrier to the resulting digital file as digitization is underway. For most at-risk formats (magnetic tape, grooved cylinders, etc.), a similar workflow can be observed. Examination of the source carrier will help determine what, if any, steps need to be taken to repair material prior to transfer. A similar inspection must be undertaken for the playback machines. If satisfactory conditions are met for both carrier and playback machine, the transfer can take place, moderated by an analog-to-digital converter. The digital signal is then represented visually for the transfer engineer by a digital audio workstation, like Audacity, WaveLab, or Pro Tools. Reference access copies can be made at smaller sample rates. For archival purposes, it is standard to transfer at a sample rate of 96 kHz and a bit depth of 24 bits per channel. The time spent planning, doing the work, and processing the digital files along with the expense and fragility of some materials are some of the most common.

Time spent

Digitization is a time-consuming process, even more so when the condition or format of the analog resources requires special handling. Deciding what part of a collection to digitize can sometimes take longer than digitizing it in its entirety. Each digitization project is unique and workflows for one will be different from every other project that goes through the process, so time must be spent thoroughly studying and planning each one to create the best plan for the materials and the intended audience.

Expense

Cost of equipment, staff time, metadata creation, and digital storage media make large scale digitization of collections expensive for all types of cultural institutions.

Ideally, all institutions want their digital copies to have the best image quality so a high-quality copy can be maintained over time. However, smaller institutions may not be able to afford such equipment or manpower, which limits how much material can be digitized, so archivists and librarians must know what their patrons need and prioritize digitization of those items. To help the information institutions to better decide the archives worth of digitization, Casablancas and other researchers used a proposed model to investigate the impact of different digitization strategies on the decrease in access requests in the archival and library reading rooms.

MPLP

One way to save time and resources is by using the More Product, Less Process (MPLP) method to digitize materials while they are being processed.

Digitizing marginalized voices

Digitization can be used to highlight voices of historically marginalized peoples and add them to the greater body of knowledge. Many projects, some community archives created by members of those groups, are doing this in a way that supports the people, values their input and collaboration, and gives them a sense of ownership of the collection. It combines new audio and video oral histories with digitized flyers, posters, and newsletters from Grand Valley State University's analog collections. This archive was started by Michelle Caswell and Samip Mallick and collects a broad variety of materials "created by or about people residing in the United States who trace their  heritage to Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, Sri Lanka, and the many South Asian diaspora communities across the globe." Price of specialized equipment, storage costs, website maintenance, quality control, and retrieval system limitations all add to the problems of working on a large scale. As organizations increasingly depend on electronic databases and information systems, their vulnerability to security threats also rises. The risk of data loss rises and cyberattacks can result in significant financial losses and damage the company's reputation . As digitization is often the first step in digital preservation, questions about how to handle digital files should be addressed in institutional standards.

{| class="wikitable mw-collapsible"

|+

! colspan="8" |Still Image Digitization Standards

|-

|Filename format

|Analog Material Type

|Color or B&W

|Resolution of Scan

|RGB Setting for Scan

|Digital File Format

|File Compression

|Metadata

|-

|YYYYMMDD_CollectionID#_Image#

|35 mm print

|Color

|600 ppi

|24 bit; 8 bits per color channel

|TIFF

|None

|Follow Local Controlled Vocabularies and LC SH and NAF

|-

|YYYYMMDD_CollectionID#_Image#

|35 mm slide

|Color

|1400 ppi

|24 Bit; 8 bits per color channel

|TIFF

|None

|Follow Local Controlled Vocabularies and LC SH and NAF

|-

|YYYYMMDD_CollectionID#_Image#

|microform

|B&W

|300 ppi

|24 Bit

|TIFF

|None

|Follow Local Controlled Vocabularies and LC SH and NAF

|}

Resources to create local standards are available from the Society of American Archivists, the Smithsonian, and the Northeast Document Conservation Center.

Mind uploading

Mind uploading is the () speculative process of copying a human mind into a digital computer so it can be emulated there. This would require some form of advanced brain scan far more detailed than what is currently possible.

See also

  • Book scanning
  • Digital audio
  • Digital library
  • Economics of digitization
  • ENUMERATE
  • Fourth Industrial Revolution
  • Frame grabber
  • Newspaper digitization
  • Optical character recognition
  • Raster to vector
  • Scannebago

References

Further reading

  • Anderson, Cokie G.; Maxwell, David C, Starting a Digitization Center, Chandos Publishing, 2004,
  • Bulow, Anna; Ahmon, Jess, Preparing Collections for Digitization, Facet Publishing, 2010,
  • Perrin, Joy, "Digitization of Flat Media: Principles and Practices", Rowman & Littlefield Publishers, 2015,
  • Piepenburg, Scott, "Digitizing Audiovisual and Nonprint Materials: the Innovative Librarian's Guide", Libraries Unlimited, 2015,
  • Robinson, Peter, Digitization of Primary Textual Sources, Office for Humanities Communication, 1993,
  • S Ross; I Anderson; C Duffy; M Economou; A Gow; P McKinney; R Sharp; The NINCH Working Group on Best Practices, Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials, Washington DC: NINCH, 2002.
  • Speranski, V. Challenges in AV Digitization and Digital Preservation
  • 'The Library of Congress National Recording Preservation Plan'