The SAO/NASA Astrophysics Data System (ADS) is a digital library portal for researchers on astronomy and physics, operated for NASA by the Smithsonian Astrophysical Observatory. ADS maintains three bibliographic collections containing over 15 million records, including all arXiv e-prints. Abstracts and full-text of major astronomy and physics publications are indexed and searchable through the portal.

Historical context

Johann Friedrich Weidler published the first comprehensive history of astronomy in 1741 and the first astronomical bibliography in 1755. This was an effort to archive and classify earlier astronomical knowledge and works.

This effort was continued by Jérôme de La Lande who published his Bibliographie astronomique in 1803, a work that covered the period from 480 BCE to the year of publication.

The Bibliographie générale de l’astronomie, Volume I and Volume II, published by J.C. Houzeau and A. Lancaster, followed in 1882 until 1889.

As the number of astronomers and astronomical publications grew, bibliographical efforts became institutional tasks, first at the Observatoire Royal de Belgique, where the Bibliography of Astronomy was published from 1881 to 1898, and then at the Astronomischer Rechen-Institut in Heidelberg, where the yearly Astronomischer Jahresbericht was published from 1899 to 1968. After 1968, this was replaced by the yearly Astronomy and Astrophysics Abstracts book series, which continued until the end of the 20th century.

History

The first suggestion of a digital database of journal paper abstracts was made at a conference on Astronomy from Large Data-Bases held in Garching bei München in 1987.

An initial version of ADS, with a database consisting of 40 papers, was created as a proof of concept in 1988. The ADS Abstract Service became available for general use via proprietary network software in April 1993, and it was connected to SIMBAD a few months later. In early 1994 the ADS web-based service was launched, which effectively quadrupled the number of active users in the five weeks following its introduction.

In 2011 the ADS launched ADS Labs Streamlined Search which introduced facets for query refinement and selection. In 2013, ADS Labs 2.0 started featuring a new search engine, full-text search functionality, scalable facets, and an API was introduced. In 2015, the new ADS, code-named Bumblebee, was released as ADS-beta. The ADS-beta system features a micro-services API and client-side dynamic page loading served on a cloud platform. In May 2018 the beta label was dropped and Bumblebee became the default ADS interface—with some legacy features (ADS Classic) remaining available. Development continues to the present day, with an extensible API available: enabling users to build their own utilities on top of the ADS bibliographic record.

The ADS service is distributed worldwide with twelve mirror sites in twelve countries and with the database synchronized by weekly updates using rsync, a mirroring utility which allows updates to only the portions of the database which have changed. All updates are triggered centrally, but they initiate scripts at the mirror sites which "pull" updated data from the main ADS servers.

Data in the system

At first, the journal articles available via ADS were exclusively scanned bitmaps created from the paper journals and the abstracts created using optical character recognition software. Some of these scanned articles up to around 1995 are available for free by agreement with the journal publishers, with some dating from as far back as the early 19th century. Eventually, because of a wider spread of online editions of journal publications, abstracts would start to instead be loaded into ADS directly.

Papers are indexed within the database by their bibliographic record which contains the details of the journal they were published in, and various associated metadata, such as author lists, references and citations. Originally this data was stored in ASCII format but eventually the limitations of this encouraged the database maintainers to migrate all records to an XML (Extensible Markup Language) format in 2000. Bibliographic records are now stored as an XML element with sub-elements for the various metadata.

Software and hardware

The software runs on a system that was written specifically for the ADS, allowing for extensive customization for astronomical needs that would not have been possible with general purpose database software. The scripts are designed to be as platform independent as possible, given the need to facilitate mirroring on different systems around the world, although the growing use of Linux as the operating system of choice within astronomy has led to increasing optimization of the scripts for installation on that platform.

Indexing

ADS currently (2005) receives abstracts or tables of contents from almost two hundred journal sources. The service may receive data referring to the same article from multiple sources, and creates one bibliographic reference based on the most accurate data from each source. The common use of TeX and LaTeX by almost all scientific journals greatly facilitates the incorporation of bibliographic data into the system in a standardized format, and importing HTML-coded web-based articles is also simple. ADS utilizes Python and Perl scripts for importing, processing and standardizing bibliographic data.

Search engine

thumb|right|An example of a complex search combining object, title and abstract queries with a date filter

Since its inception, the ADS has developed a highly complex search engine to query the abstract and object databases. The search engine is tailor-made for searching astronomical abstracts, and the engine and its user interface assume that the user is well-versed in astronomy and able to interpret search results which are designed to return more than just the most relevant papers. The database can be queried for author names, astronomical object names, title words, and words in the abstract text, and results can be filtered according to a number of criteria. It works by first gathering synonyms and simplifying search terms as described above, and then generating an "inverted file", which is a list of all the documents matching each search term. The user-selected logic and filters are then applied to this inverted list to generate the final search results.

Author name queries

The system indexes author names by surname and initials, and accounts for the possible variations in spelling of names using a list of variations. This is common in the case of names including accents such as umlauts and transliterations from Arabic or Cyrillic script. An example of an entry in the author synonym list is:

:AFANASJEV, V

:AFANAS’EV, V

:AFANAS’IEV, V

:AFANASEV, V

:AFANASYEV, V

:AFANS’IEV, V

:AFANSEV, V

Object name searches

The capability to search for papers on specific astronomical objects is one of ADS's most powerful tools. The system uses data from the SIMBAD, the NASA/IPAC Extragalactic Database, the International Astronomical Union Circulars and the Lunar and Planetary Institute to identify papers referring to a given object, and can also search by object position, listing papers which concern objects within a 10 arcminute radius of a given Right Ascension and Declination. These databases combine the many catalogue designations an object might have, so that a search for the Pleiades will also find papers which list the famous open cluster in Taurus under any of its other catalog designations or popular names, such as M45, the Seven Sisters or Melotte 22.

Title and abstract searches

The search engine first filters search terms in several ways. An M followed by a space or hyphen has the space or hyphen removed, so that searching for Messier catalogue objects is simplified and a user input of M45, M 45 or M-45 all result in the same query being executed; similarly, NGC designations and common search terms such as Shoemaker Levy and T Tauri are stripped of spaces. Unimportant words such as AT, OR and TO are stripped out, although in some cases case sensitivity is maintained, so that while and is ignored, And is converted to "Andromeda", and Her is converted to "Hercules", but her is ignored.

Synonym replacement

Once search terms have been preprocessed, the database is queried with the revised search term, as well as synonyms for it. As well as simple synonym replacement such as searching for both plural and singular forms, ADS also searches for a large number of specifically astronomical synonyms. For example, spectrograph and spectroscope have basically the same meaning, and in an astronomical context metallicity and abundance are also synonymous. ADS's synonym list was created manually, by grouping the list of words in the database according to similar meanings. ADS has allowed literature searches that would previously have taken days or weeks to carry out to be completed in seconds, and it is estimated that ADS has increased the readership and use of the astronomical literature by a factor of about three since its inception. so the value of ADS to astronomy would be about 200–250 million USD annually. Its operating budget is a small fraction of this amount.

Sociological studies using ADS

Because it is used almost universally by astronomers, ADS can reveal much about how astronomical research is distributed around the world. Most users access the system from institutes of higher education, whose IP address can easily be used to determine the user's geographical location. Studies reveal that the highest per-capita users of ADS are France and Netherlands-based astronomers, and while more developed countries (measured by GDP per capita) use the system more than less developed countries; the relationship between GDP per capita and ADS use is not linear. The range of ADS usage per capita far exceeds the range of GDP per capita, and basic research carried out in a country, as measured by ADS usage, has been found to be proportional to the square of the country's GDP divided by its population.

See also

  • List of academic databases and search engines
  • Bibcode
  • INSPIRE-HEP
  • NASA's Planetary Data System (PDS)
  • PubMed
  • Michael J. Kurtz

References