DH-Concepts

Abstracts: Montag, 26. Oktober 2015

What does descriptive markup contribute to digital humanities?

Michael Sperberg-McQueen
KIVA Visiting Professor for Internationality and Interculturality, Technische Universität Darmstadt; Black Mesa Technologies LLC

Descriptive markup is not, perhaps, the most important concept in the digital humanities, but it has been historically important; this paper attempts to identify its key parts and their significance.
1 Documents have structure worth exposing.
2 No predefined set of primitive notions will be adequate for a general-purpose document representation language; it must be possible for users to define their own sets of basic notions (in XML terms, their own element types and attributes).
3 Documents can be made reusable by representing them in an application-independent, vendor-neutral format.
4 A practical consequence of 2 and 3 is that documents will be most useful when users specify not how the different parts of a document should be processed, but what they are.
5 In general, the best results are obtained when document markup is declarative (not imperative) and descriptive (not tied to display, layout, or any other form of processing).
6 The natural interpretation of SGML and XML describes a document as a directed acyclic graph of nodes, whose arcs are defined either by the parent-child relation holding between elements or by ID/IDREF pointers connecting elements in the document, and whose nodes are decorated with sets of attribute-value pairs. From this graph structure, simpler structures such as trees and sequences can be created by projection (systematically ignoring parts of the information in the graph) and other structures can be created (or simulated) by suitable interpretation of arcs or by restructuring the data under program control.
7 Using the tree structure inherent in the elements of an XML document, a context-free grammar can be used to validate the input against user-defined expectations. Valid documents are labeled bracketings of trees generated by the context-free grammar.
That these are not the truisms or meaningless phrases they might at first seem, may be illustrated by document processing tools and word processors which treat documents as flat sequences of characters, without any particular structure, or as flat sequences of differently styled paragraphs each consisting of a flat sequence of differently styled characters; which pre-define a set of semantic primitives in terms of which all documents must be interpreted; which use proprietary data representations without published specifications; which do not allow for declarative or descriptive identification of document structures; and which provide no user-controllable means of imposing constraints or distinguishing structurally sound documents from data corrupted in transmission.
There has been a good deal of discussion of the adequacy of tree structures for the representation of documents, sometimes coupled with proposals for alternative structures. It is worthwhile to consider this topic in the context of larger considerations about the adequacy and suitability of data structures in general.

It's all Google's fault!

George Landow
Professor of English and Art History, Brown University

As I argued in my talk last November at Google London: it’s all Google’s fault: In the 90’s the battle between those who favored link-based hypermedia and search tools ended with a victory for the link when the WWW came along. But when Google successfully created a superb search tool, that became the victor with the effect that, as we have seen with surveys pf university students, readers tend to use search tools rather than follow links even when links lead directly to the material for which they are searching. We assume that readers who live on their iPhones and Galaxies are computer literate, and in certain ways they certainly are, but they have lost or never had the ability to read and think hyper textually. We have found, however, that a brief 5-minute explanation of how the Victorian Web was constructed and the advantages of think of networked information results in readers reading hyper textually. In the last part of my talk I would like to discuss how these issues have led to experiments with sitemaps / homepage for complex subject areas in the Victorian Web, such as religion, political history, and painting, which all have numerous sub-categories.

Kaffeepause

Great Expectations Seeding Forests of Trees. Some key ideas of Digital Humanities in Father Busa's own words

Marco Passarotti
CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy

In my talk, I will present some key ideas of computational linguistics and digital humanities taken from two papers of one of the pioneers in the field, father Roberto Busa SJ.
In the two papers (from 1962 and 1983, respectively), father Busa discussed the impact of "automation" on the Humanities and presented a number of open challenges that he thought the discipline would have had to face in the coming years.
I will show that many of the great expectations mentioned by Busa have now become reality. In particular, I will present the current state of the Index Thomisticus corpus, by introducing the Index Thomisticus Treebank project.
Further, I will describe the original material owned by the 'Busa Archive' at Università Cattolica of Milan. By including press articles in the national and international media on Busa, correspondence between Busa and his contemporaries in Italy and abroad, material relating to particular phases of the Index Thomisticus and the Opera Omnia of Busa, the Archive represents an invaluable source of documentation for anyone interested in the history of the discipline.

Automation on Parnassus - Clio / κλειω and Social History

Manfred Thaller
Professor für Historisch-Kulturwissenschaftliche Informationsverarbeitung, Universität zu Köln

In the methodological discussions of the sixties and seventies in the historical disciplines information technology was originally seen as almost inseparable from quantitative methods. Particularly in hindsight it is frequently overlooked, that the question whether historical sources provided data which could be fed into standard statistical software, was discussed quite intensively at the time. Particularly in social history, playing a key role in the period when social science was the main target of interdisciplinary aspirations of the field, there existed a discussion, often connected to the journal “Historical Methods”, which focused very early on the representation of huge networks of factoids extracted from historical sources and their analysis. To support such research in 1978 at the then Max-Planck-Institut für Geschichte (History) the development of software dedicated to historical research has been started. Conceptually based upon a concept of “source-oriented data processing” this lead to the software system CLIO, later κλειω, which supported the analysis of graph oriented data bases of millions of nodes, embedded into a processing environment, which offered a number of domain specific services, up to an implementation of the Latin lemmatization developed by Bozzi in Pisa. In later years it was used to power XML based digital libraries, a few of which are still operational.
The presentation focuses on (a) the conceptual model behind that development, (b) the data model derived from it, (c) the architecture of the software supporting the approach and (d) some parallels in recent research.

Kontakt

DH-Concepts
Technische Universität Darmstadt
Institut für Sprach- und Literaturwissenschaft

Dr. Sabine Bartsch
Dr. des. Michael Bender

dh-concepts(at)linglit.tu-darmstadt.de
www.dh-concepts.tu-darmstadt.de

Besucheradresse:
Landwehrstraße 50A
Gebäude S4|23
64293 Darmstadt

Postanschrift:
Dolivostraße 15
64293 Darmstadt