Shape the Future of EAD: A Call to Action – Part II

By Kerstin Arnold, Cory Nimer, and Mary Samouelian

In our first blog post, we introduced you to the major revision of the Encoded Archival Description (EAD) standard. We gave an insider’s look into the revision process, including why the standard is being revised, how it is being revised, and what actions you can take to participate. This post, the second in a series of five, examines the work that has been done to align EAD 4.0 with its sibling standard, Encoded Archival Context – Corporate Bodies, Persons, and Families (EAC-CPF). We’ll touch on the importance of standards alignment, provide a broad overview of where alignment between the two standards is currently lacking and how this impacts our work, and finally highlight a couple of key changes proposed for EAD 4.0 that will bring the two standards closer together, thereby enabling a smoother side-by-side use of both.

Why align standards?

Standards form the basic building blocks for managing, publishing, and sharing information. They not only ensure that information can be universally understood, but also facilitate the adoption of similar approaches by cultural heritage professionals across different domains regardless of national contexts. Technical standards help fuel compatibility and interoperability, making it easier for systems to communicate with one another and to exchange and process data. EAD and EAC-CPF, both maintained by the Technical Subcommittee on Encoded Archival Standards (TS-EAS), are XML standards for encoding information – EAD for descriptive information regarding archival materials, and EAC-CPF for descriptive information about the creators of archival materials. With archival materials on the one hand and the creators of these materials on the other hand, each standard covers a distinct aspect of archival description. EAD and EAC-CPF are therefore designed to complement each other; in practice, however, they also overlap quite a bit. This overlap includes not only some broader shared concepts such as the principle of provenance, but is also found in more detailed aspects like the encoding of dates or places.

When such overlaps are not handled in the same way, however, misalignment occurs. The most extreme form of misalignment happens when two standards use entirely different XML elements and attributes to encode the same concept. More subtle forms of misalignment – which will be the focus of this blog post – can be found when two standards:

  • use the same (or similar) element and attribute names for a shared concept but with slightly different terminology,
  • define slightly differently how an element can be broken down into sub-elements or whether a specific piece of information is required or not, or
  • provide slightly different guidance on the element’s or attribute’s correct use, for example, by defining different sets of predefined values.

An example of such misalignment between the current versions of EAD and EAC-CPF is the element pair <geogname>, in EAD3, and <placeName>, in EAC-CPF 2.0. Both are meant to provide the name of a place, natural feature, or political jurisdiction and both recommend using terms from controlled vocabularies to do so. Conceptually, these two elements are the same, but their design does not reflect this. The first and more obvious misalignment lies in the different names (<geoname> vs <placeName>), which might lead to confusion on how to interpret both elements – not only for cultural heritage professionals inserting place names as part of archival descriptions, but also for researchers using them. Furthermore, querying a dataset for place names (e.g. for front-end discovery), will always require a search index to be aware of two separate elements instead of one. The less obvious misalignment between these elements results from <geogname> requiring a <part> sub-element and allowing for the name to be distributed across several <part>-s, while <placeName> can contain the name directly and does not have any sub-elements. I.e. when encoding Paris,  the county seat of Lamar County, Texas, United States of America, versus Paris, the capital of France, <placeName> could say “Paris, Lamar County, Texas, United States of America”. <geogname> might do the same by putting the complete text string into only one <part> element, but would also allow for having the first <part> saying “Paris” , the second “Lamar County”, the third “Texas” and the fourth “United States of America”. This option of breaking the place name down in EAD3 but containing it as one string in EAC-CPF 2.0, makes moving data between both existing standards more complex.

Encoding the name of a location with <geogname> in EAD3 compared to encoding with <placeName> in EAC-CPF 2.0; Technical Subcommittee on Encoded Archival Standards (TS-EAS); CC BY-S

Alignment with EAC-CPF

With this in mind, the members of TS-EAS took on the task of better aligning the EAD and EAC-CPF standards, with an eye towards improved interoperability and more generally applicable and accessible language. This process already started during the major revision of EAC-CPF (between 2017 and 2022), as it became evident that some of the overlapping elements and attributes in EAC-CPF and EAD were not consistently named, used, or defined. Knowing that a major revision of EAD was on the horizon, the members of TS-EAS agreed that any discrepancies between the two standards should be addressed as part of both revision processes.

A substantial amount of time and effort of the EAD revision process therefore focused on reviewing how elements and attributes are used in each standard. We reviewed both EAD and EAC-CPF to:

  • determine whether elements and attributes with the same name in both standards were indeed intended to be used in the same way;
  • ensure that they were defined in the same way in that case;
  • confirm whether elements and attributes with similar names were meant to be used in the same way;
  • decide on an aligned definition in both standards in that case (i.e., applying the same name and the same content model); and
  • find a more distinctive name and definition in the case elements and attributes with the same name in both standards were actually intended to be used differently.

This labor-intensive aspect of the team’s work was well worth the effort. Below are some of the changes that these efforts resulted in.

Introduction of CamelCasing

Over the years, the diverse and inconsistent naming convention for elements and attributes in EAD and in the earlier version of EAC-CPF has bewildered cultural heritage professionals, leaving them wondering what a particular element or attribute means. In the case of EAD, examples range from a combination of two or more abbreviated or full words (e.g.  <acqinfo> or <unitdatestructured>), a shorthand acronym (e.g. <odd>, which stands for Other Descriptive Data), or, conversely, clearly stated or obvious (e.g. <title>). To add to the confusion, the terminology used to name elements and attributes is based on Anglo-American description traditions, which means that not every element and attribute resonates with professionals around the globe who are also using these standards.

And so, while EAD and EAC-CPF were both promised to be readable by machines and humans alike, in practice they were not.

camelCase was introduced to the previous version of EAC-CPF in 2010. camelCase is a typographic, consistent way to separate words in a phrase by capitalizing the first character of each word except for the first word, and not using spaces. This seemingly minor change not only improves the readability of elements and attributes, but also adheres to best practices in XML definitions. The XML tagging is easier to read for the human eye, which is especially important considering the broad international community of users, many of whom do not speak English as their first language. Therefore, EAD 4.0 will follow EAC-CPF in applying camelCase spelling to any element and attribute names consisting of a combination of terms. Users should immediately see the benefits of this work better readability supports better recognizability of an element or attribute name’s origin, thereby making EAD easier to learn and to interpret.

In addition to camelCase, the team also took this opportunity to review the names of elements and attributes more generally for better alignment with the terminology used in related standards like ISAD(G), Records in Contexts (RiC), and Describing Archives: A Content Standard (DACS). This also included the review of abbreviated names of elements and attributes in particular to consider more concrete and more easily understandable names. Some of these changes are a slight variation on a theme. For example <acqinfo> will be renamed <sourceOfAcquisition> and <altformavail> will be renamed <formAvailable>. Others have undergone a complete overhaul, such as <odd>, which has been renamed <otherDescriptiveInfo>, and <bibliography>, which has been renamed <publicationNote>. A complete list of the changes can be found in the EAD3.0 to EAD 4.0 Changes spreadsheet.

Shared concepts, shared definition

The EAD revision also has resulted in creating shared definitions for shared concepts between standards, such as the encoding of places and place names briefly mentioned above. EAC-CPF includes a wrapper element called <place>, which contains the name of the place, i.e. the <placeName> element mentioned above, along with encoding options for other aspects such as address information, the definition of a place role, and information about geographic coordinates. All these aspects were also represented in EAD3, but were grouped differently or not at all. Geographic coordinates, for example, were encoded as a sub-element to the place name and the role of a place was only captured in an attribute. Address information, on the other hand, was used completely independent from other place information in EAD3. EAD 4.0 brings these together in the same <place> element that is used in EAC-CPF 2.0, allowing for a direct exchange between both standards without the requirement of a mapping or transformation. 

Full encoding example of a place with its name, role, geographic coordinates, address, and contact details in EAC-CPF 2.0 and in the draft for EAD 4.0; Technical Subcommittee on Encoded Archival Standards (TS-EAS); CC BY-SA

The two examples above provide a glimpse into the alignment work our team has done. In the next post of this series on Descriptive Notes we will take a closer look at changes that began as an alignment task but evolved into a broader, conceptual change in EAD – so stay tuned!

What Can I Do to Participate?

The first draft of EAD 4.0 has been published as of last week and will be kept open for community feedback until the end of July. To make sure that the updated version of EAD fits our community’s requirements and that the benefits of a new version of EAD outweigh the challenges of transitioning from one version to another, the members of the EAD sub-team want to hear your comments, questions, suggestions, and concerns about EAD 4.0. You can provide feedback via GitHub or the TS-EAS email, ts-eas@archivists.org. We will also offer several informal drop-in sessions online between April and June to discuss and ask questions, starting with an introductory session on how to contribute to the call for comments this week Wednesday, April 24, 4pm UTC (open for registration). We are not doing this work alone!


Top image citation: All about Archives – The Major Revision of the Encoded Archival Description standard, 2021-2024. [sources: (1) Public archives of the International Committee of the Red Cross (ICRC), Geneva, Switzerland. By Roman Deckert. https://commons.wikimedia.org/wiki/File:CICR-ICRC-PublicArchives_WWI-files_RomanDeckert09062020.jpg; (2) The Research Data Management (RDM) lifecycle at the University of Cape Town (UCT). 1 December 2018. Design by Gaelen Pinnock. https://commons.wikimedia.org/wiki/File:UCT_RDM_lifecycle_%28all_icons%29.svg; (3) Encoding examples. EADiva Tag Library by Ruth Kitchin Tillman. https://eadiva.com/dsc/ and https://eadiva.com/archdesc/; (1) and (2) CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0), (3) CC BY (https://creativecommons.org/licenses/by/4.0/)]


Kerstin Arnold is the Chief Operating Officer for Archives Portal Europe, an aggregator gathering more than 650,000 archival collection descriptions from over 1,200 institutions in 30+ European countries. She leads the EAD sub-team of the Technical Subcommittee on Encoded Archival Standards (TS-EAS).

Cory Nimer is the University Archivist at Brigham Young University. He leads the Outreach sub-team of TS-EAS.

Mary Samouelian is the Manager, Archival Processing for Baker Library Special Collections and Archives at Harvard Business School. She is the co-chair of TS-EAS and a member of the EAD and Outreach sub-teams.

Leave a comment

Design a site like this with WordPress.com
Get started