By Sharon Mizota
In the summer of 2020, I had the privilege of working as a metadata consultant for Curationist.org, an online database of public domain and Creative Commons-licensed cultural materials sourced from museums, archives, libraries, and other institutions. Curationist aggregates this content, but will also offer a platform for Wikipedia-style contributions to associated metadata. Curationist staff and community members will be encouraged to bring their own knowledge and research to the site, enhancing or counterbalancing traditional modes of description and adding a greater diversity of voices to the cultural record. In this, Curationist aligns itself with larger anti-colonial, anti-racist, feminist, queer, and anti-ableist movements. The site will surface content and metadata supplied by contributing institutions, and will encourage users to layer their own metadata on top of these records, creating an enhanced dataset that will eventually be available as linked open data. I designed a metadata schema, content standard, and taxonomy guidelines for a relaunch of the site, which at the time of this writing is still in development.
One of the project’s more challenging aspects was creating a taxonomy for categorizing and describing the assets on the site, which may come from anywhere in the world, in any language, and range from paintings and sculptures to eBooks and 3D models. When I began the project, I planned to develop a custom taxonomy. But during the discovery phase, it became clear that the Curationist team wanted to use Wikidata as the controlled vocabulary for the entire site.
As you may know, Wikidata is the central repository for the structured data—names, titles, topics, etc.—associated with all Wikimedia projects. It’s basically a giant vocabulary for naming and describing anything you might find on the Internet, and it can be updated and edited by anyone.
The decision to use it was challenging in at least three ways. First, I had to quickly learn how to use Wikidata and become familiar with its quirks. Second, I had to shift my mindset from creating a custom taxonomy to providing guidance on how to use a crowd-sourced dataset that is always changing. Third, I had to figure out how Wikidata could support Curationist’s reparative social justice goals.
Learning to use Wikidata was the easy part. Like most Wikimedia projects, it has extensive tutorials, tours, and help pages. It’s structured much like other controlled vocabularies, except of course it is much less controlled. I found two main ways in which it differs from professionally managed vocabularies: formatting and hierarchy.
The formatting of terms, even those within the same classification, such as “ethnic group,” is often inconsistent: “African Americans” (Q49085) is plural, whereas “Asian American” (Q727928) is not. My inner perfectionist was unhappy with such inconsistencies, but I soon came to understand: it’s not that big of a deal. Each Wikidata term has a persistent unique identifier (the “Q” numbers included above), so the concept remains stable even if the term itself changes over time.
Classification hierarchies also vary widely. In my initial searches, the terms “dog” (Q144) and “wolf” (Q18498) existed in two totally separate hierarchies, even though the mammals they refer to are related species. Whereas in traditional vocabularies you can usually look up or down the hierarchy to figure out if you have the right term, in Wikidata, you can’t rely on relationships between terms to aid in selection. Although irksome, these inconsistencies are tolerable; they are simply the cost of using a crowd-sourced vocabulary with more current terminology than the older, slower-to-change authorities I was used to.
Yet Wikidata’s mutability also comes with a cost. It’s difficult to embark on a descriptive project knowing that your terms might shift under your feet at any moment. A taxonomy composed of Wikidata terms would be time-consuming to create and maintain. Instead, I had to follow Curationist’s collaborative ethos, and let the staff make their own decisions about how to classify things. (The guidelines will be primarily consulted by staff contributors, not necessarily by community members.) In much the same way as a content standard like DACS or RDA is designed to guide the sourcing and input of data, Curationist’s Taxonomy Guidelines are designed to guide, not dictate, the selection of descriptive terms. For example, the term for “wolf” has since been changed to “Canis lupus” and there is now another entry “wolf” (Q3711329) that is more similar to the entry for “dog”. The Curationist guidelines recommend the use of colloquial terms over specialist or scientific ones, so staff members will still be able to determine which “wolf” to use.
In accordance with Curationist’s social justice goals, the guidelines also needed to indicate the most appropriate, respectful terms to use. Although Wikidata shifts more quickly than other vocabularies to reflect contemporary language, it does not include everything. Decisions and compromises necessarily have to be made, but the guidelines do suggest preferred terms and outline the factors to consider in term selection.
One of the most difficult areas is terms that describe people. For example, Wikidata includes “Latinx” (Q30324002) as a gender-neutral term to describe people of Latin American descent, but not “Latine.” It also includes “Latino” (Q1464994), “Hispanic” (Q1211934), and “Hispanic and Latino Americans” (Q58669). While there is no consensus, even in Latinx communities, on which term is best, the Curationist guidelines indicate “Latinx” as the preferred term—for now.
Wikidata is also strangely lacking in terms describing people of various genders and sexualities. It has several synonyms or sub-categories of “woman” (Q467), including “female” (Q6581072), “female organism” (Q43445) and “cisgender female” (Q15145779). The Curationist guidelines specify “woman” as the preferred term, as it is not as tied to a person’s biological sex, with the addition of “transgender person” (Q189125) if the woman is transgender. With regards to sexual orientation, Wikidata terms do not describe people so much as sexualities: “asexuality” (Q724351), “bisexuality” (Q43200), “heterosexuality” (Q1035954). This is a gap, but the nice thing about Wikidata is that it is open. Perhaps one day someone will take it upon themselves to integrate the terms from the LGBTQ vocabulary, Homosaurus, into Wikidata.
Other areas for improvement are terms describing people who have disabilities and people experiencing homelessness. In this regard, Wikidata has yet to include terms that more fully emphasize their humanity. The Curationist guidelines prefer “person with disabilities” (Q15978181) to “cripple” (Q1790733), and “homeless people” (Q29325697) to “tramp” (Q1965933) or “hobo” (Q843281). These were not tough decisions.
In making recommendations like these, I consulted Curationist’s internal staff style guide, but also resources like the Conscious Style Guide, which was developed to help journalists choose more respectful and appropriate language. I also drew on my own background in ethnic studies and involvement in progressive causes. But even with this experience, it was sometimes uncomfortable to make these decisions. It was tempting to simply add the terms I wanted to see to Wikidata, in order to make them available for Curationist use. However, not only was this beyond the scope of the initial project, it also meant wading into contentious waters.
While I was learning to use Wikidata, I made a small edit to the Wikidata item “black people” (Q817393), simply capitalizing the word “Black.” This change might have drawn attention to the entry, because the next time I checked, the term was gone—the page was still there, but the space where the term usually appears was blank—and a slew of racial slurs had been added below it. Fortunately, a more experienced editor flagged the page as having been vandalized and reverted it to its original state, but I was chastened. Recently, I was gratified to see that the page has since been updated; “Black” is now capitalized.
In the oft-quoted words of Dr. Martin Luther King, Jr., “the arc of the moral universe is long but it bends toward justice.” Although it may seem grandiose to invoke this sentiment for something as humble as metadata, it’s important to remember that the power to describe is the power to define. As gatekeepers of history, we help shape discourse, which, if not done responsibly, can be used to humanize some people and dehumanize others, with life-or-death consequences. Using Wikidata as a controlled vocabulary is a headache because it is always changing, and the Curationist Taxonomy Guidelines will need to be updated on a regular basis. Neither are perfect, but they are open to change, and if we remain vigilant, they will bend toward justice.
For those interested in learning more, a copy of the Curationist Taxonomy Guidelines can be found here.
Sharon Mizota is a DEI Metadata Consultant based in Los Angeles.