FamilySearch GEDCOM, or simply GEDCOM ( , acronym of Genealogical Data Communication), is an open file format and the de facto standard specification for storing genealogical data. A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format.
GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information about individuals such as names, events, and relationships; metadata links these records together.
GEDCOM 7.0, released in 2021, is the most recent version of the GEDCOM specification . However, its predecessor, GEDCOM 5.5.1, remains the industry's format standard for the exchange of genealogical data. First released as a draft standard in 1999, GEDCOM 5.5.1 received only minor updates in the subsequent 20 years leading up to the release of 5.5.1 final in 2019. To address its shortcomings, some genealogy programs introduced proprietary extensions to GEDCOM which are not always recognized by other programs, such as GEDCOM 5.5 EL (Extended Locations). Efforts have been made to have 7.0 more widely adopted since its release. FamilySearch intends to be GEDCOM 7.0 compatible in the third quarter 2022 and Ancestry.com is planning for 7.0 compatibility, but has not yet specified an implementation date.
Data model
GEDCOM uses a lineage-linked data model based on the conceptual model of the nuclear family. The family (<code>FAM</code>) record type is therefore the only source of links between the individuals (<code>INDI</code>) in the file, assigning parents (as <code>HUSB</code> and <code>WIFE</code>) and children (as <code>CHIL</code>) by referring to individuals' unique ID numbers. These historical origins are described in the 7.0 specification document: "The <code>FAM</code> record was originally structured to represent families where a male <code>HUSB</code> (husband or father) and female <code>WIFE</code> (wife or mother) produce <code>CHIL</code> (children)."
File structure
A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.
Although it is possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation "The Windows GEDCOM Validator" can be used. or the older unmaintained Gedcheck from the LDS Church.
During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program. Findings showed that a number of problems existed and that "The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear." In 2005, the Genealogical Software Report Card was evaluated (by Bill Mumford who participated in the original GEDCOM Testbook Project) and included testing the GEDCOM 5.5 standard using the Gedcheck program.
To assist with adoption of GEDCOM 7.0, validation tools now exist for that standard as well.
Example
The following is a sample GEDCOM file.
{| class="infobox vcard" style="text-align: center; font-size:90%;"
|class="adr" style="text-align: center;"|sample.ged
|-
|class="adr" style="text-align: left;"|
0 HEAD
1 SOUR PAF
2 NAME Personal Ancestral File
2 VERS 5.0
1 DATE 30 NOV 2000
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @U1@
0 @I1@ INDI
1 NAME John /Smith/
1 SEX M
1 FAMS @F1@
0 @I2@ INDI
1 NAME Elizabeth /Stansfield/
1 SEX F
1 FAMS @F1@
0 @I3@ INDI
1 NAME James /Smith/
1 SEX M
1 FAMC @F1@
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@
0 @U1@ SUBM
1 NAME Submitter
0 TRLR
|}
The header (HEAD) includes the source program and version (Personal Ancestral File, 5.0), the GEDCOM version (5.5), the character encoding (ANSEL), and a link to information about the submitter of the file.
The individual records (INDI) define John Smith (ID I1), Elizabeth Stansfield (ID I2), and James Smith (ID I3).
The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.
Versions
The current version of the specification in wide use is GEDCOM 5.5.1 final, which was released on 15 November 2019. Its predecessor, GEDCOM 5.5.1 draft was issued in 1999, introducing nine new attribute, tags and adding UTF-8 as an approved character encoding. The draft was not formally approved, but its provisions were adopted in some part by a number of genealogy programs including FamilySearch.org. PAF 5.2 is an example of software that uses UTF-8 as its internal character set, and can output a UTF-8 GEDCOM.
GEDCOM 7.0 requires UTF-8 encoding throughout, and resolves other long-standing issues with GEDCOM 5.5.1. Multimedia support in the form of an associated .zip file, called a GEDZip, is another inclusion. Efforts are underway to see 7.0 embraced as the new exchange standard. GEDCOM 7.0 allows explicitly identifying what standards other than GEDCOM may apply to a particular file. GEDCOM has always been extensible, but prior to 7.0 there was no standard way to identify such extensions. Also, GEDCOM 7.0 allows explicitly marking an event as nonexistent. This allows, for example, documenting that a particular individual never married.
Release history
{| class="wikitable"
|-
! GEDCOM version
! Release date
! Notes
|-
!
| 1984
| –
|-
!
