Sandrart.net: An enriched online edition of a 17th century text

Carsten Blüm, Goethe-Universität Frankfurt am Main

February 23rd, 2012 • Berlin

Agenda

What is “Sandrart.net”?

Subject
Goals
Information on the project itself

What we achieved

Annotations

Strategies
Experiences

Some thoughts on lifecycles

The subject

Joachim von Sandrart: “Teutsche Academie der Bau-, Bild- und Mahlerey-Künste”

Nürnberg 1675 / 1679 / 1680

The subject

3 volumes • 1,600 pages (not including blank pages) • 6.9 million characters • Approx. 300 full-page copperplate engravings

The Idea

Goals

In one sentence:
A (partially) annotated web-based edition of the “Teutsche Academie” that assists the user in finding persons and places mentioned in the text and that offers partial translations to other languages.

The project

Goethe-Universität Frankfurt
Kunsthistorisches Institut in Florenz – Max-Planck-Institut
Funded by the Deutsche Forschungsgemeinschaft (DFG)
April 2007 – March 2012
Long-time availability: Herzog August Bibliothek Wolfenbüttel (HAB)

Current state

Resources

Digitized images available on the web
Full text available on the web
Databases of entities mentioned in the text
- More than 5,000 persons
- More than 2,000 places
- More than 2,400 works of art
  (+ More than 2,500 additional artworks which provide additional context)
- More than 300 publications and manuscripts mentioned in the text
  (+ Bibliography)
More than 6,000 annotations
Roughly 30,000 cross-references (triples) between entities
Partial translations

Current state

Information retrieval

Faceted search (full-text search, search in annotations, search in databases)
Individual searches / filtering methods in all databases
Approx. 60,000 occurrences of persons, places, publications and artworks tagged in the text. Every entity is searchable, regardless of spelling or name used in the text.

Connecting & linking

Thousands of cross-references to authority files and data sources (PND, ULAN, TGN, Census, VD17, …)
PND-based cross-references to other sites (“PND BEACON”)
Permanent URLs (PURLs) for text pages and database entities, incl. annotations
Web API (proprietary XML/JSON via REST), incl. text occurrences
Linked Open Data (RDF)

The edition

Introduction to the edition, starting on page 629

Annotations

“Classical” annotations
Annotations in the context of our project: Information added to things
Purposes
Annotations in this broader sense in many places in the edition, also beyond the text
Why non-textual, machine-readable annotations?
To sum up: Textual, “classical” annotations are important, but non-textual, machine-readable annotations form the backbone of Sandrart.net

Annotations

The need to be precise

Contradictory information
Ambiguous information
Information that is biased or implies a certain perspective
Not everything can be expressed

Annotations

The side-effects of complexity

What about the user?
What about contributing scholars?
Can become more difficult to present the information in a useful way.

Annotations

Annotations & “The Cloud”

Annotations become distributed
Modern concepts (Linked Open Data, Web APIs, nanopublications, …) can offer new possibilities to enrich your data
Problem: Increasing dependencies

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”
(Leslie Lamport)

Lifecycle

Availability of our resources and services
Availability of external resources
Problem: consuming/using external data vs. changing content

Lifecycle

Reliable content vs. ongoing development

Benefits

Get feedback
Find contributors
Find cooperations (Deutsches Textarchiv, Herzog-August Bibliothek)

Thank you

Contact

Carsten Blüm, bluem@sandrart.net

URLs

Project website: www.sandrart.net

Edition: ta.sandrart.net

Appendix – In case someone asks: Transcription

Why manual text transcription?

Mix of Gothic print and Antiqua (even inside words)
11 languages
Inconsistent orthography
Abbreviations
Conclusion: OCR not feasible

Appendix – In case someone asks: Tagging

Why manual tagging of entities in the text?

Ambiguities or ”False friends”
- “Teutscher Apelles” = Joachim von Sandrart, not Apelles
- Errors in seemingly unambiguous names.
  For instance: “Benedict IX.” = Benedict XI.
“Rhenus” could be the river or a personification of the river
Not only names, but also paraphrases
No advance knowledge of which entities are in the text
Large number of different entities, hundreds still not even identified
Artworks: no formal criteria. Sometimes a single word, sometimes a paragraph

Appendix – In case someone asks: TEI usage

TEI usage

Physical text structure (volumes, pages, column breaks, …)
Logical text structure (<div>, <head>, <p>, <argument>, <titlePage>, <lg>, …)
Objects in the text (persons, places, artworks, publications): <rs type="…">…</rs>
Small number of other elements used: <sic>, <corr>, <date>, <foreign>, <ref>, <seg>, …

Appendix – In case someone asks: Handling of annotations

Method

Annotations are saved in a database and only referenced in TEI
Annotations are “compiled” into TEI

Reasons

Annotations in entity records are technically identical
Annotations can be re-used
When kept in a database, earlier versions can be more easily saved and diff-ed
Easier searching
Most simple solution for offering PURLs for annotations and display annotations by themselves
Annotations become objects in their own right

Sandrart.net: An enriched online edition of a 17th century text

Agenda

What is “Sandrart.net”?

What we achieved

Annotations

Some thoughts on lifecycles

The subject

The subject

The Idea

Goals

The project

Current state

Resources

Current state

Information retrieval

Connecting & linking

The edition

Annotations

Annotations

The need to be precise

Annotations

The side-effects of complexity

Annotations

Annotations & “The Cloud”

Lifecycle

Lifecycle

Reliable content vs. ongoing development

Benefits

Thank you

Contact

URLs

Appendix – In case someone asks: Transcription

Why manual text transcription?

Appendix – In case someone asks: Tagging

Why manual tagging of entities in the text?

Appendix – In case someone asks: TEI usage

TEI usage

Appendix – In case someone asks: Handling of annotations

Method

Reasons

Appendix – In case someone asks: Implementation

KISS principle: “LAMP”

“XML shredding”

Why no XML database?

Appendix – In case someone asks: Context-sensitive Shredding

Appendix – In case someone asks: Changed content

Changes since the first launch of the edition in July 2008