Paper prepared for the Shakespeare Association of America, 1999.
Organizing the maze of data

There are two basic challenges in devloping a database of performance materials as part of an Internet edition of the plays:

Preserving the data in a continually changing technology

This is a question of the kind more appropriate to a paper to be given for a group concerned with the technology of the computer and questions of standardizing data formats. Preservation of the text is relatively simple using a recognized system of "tagging" the text so that information provided by the editors is retained no matter what software or hardware is used to display or analyze it.

More uncertain is the means to preserve graphic, sound, and video data, in a period when file types are changing, and algorithms for the sampling of digital data and their compression are evolving rapidly. Some years ago -- about 14 -- I was involved in the creation of some audio tapes for two distance education courses on Shakespeare. The written component of the course has substantially evolved over this period, as the critical and textual world of Shakespeare studies has been vigorously fermenting. The audio component has not changed -- they were good performances, an remain so -- but their presentation has changed. They were initially recorded on stereo reel-to-reel tapes, two tracks at 15 inches per second (I still have the originals in my office), but the equipment to play them has now become hard to find. They were initially released on cassette mono tapes, then redubbed on stereo cassetes, and in their next incarnation will be provided on CDs with the attentant ease of access. This change has occurred in a non-electronic environment: the whole question of file and compression formats is far more unstable, with new formats for sound and video appearing every year.

In collaboration with Special Collections at the University of Victoria, the Internet Shakespeare Editions are undertaking to preserve the original analogue materials as well as their digitized counterparts as much as possible; thus in due course it will be possible to re-digitize if newer technologies and compression formats warrant it. However, it it clear that the aim of the performance database of the ISE is different in kind from the archive that Peter Donaldson is creating in collaboration with the Folger, where the preservation of archival material is one aim. The primary purpose of the site is more simply to provide sufficient detail in the materials for teaching and critical readings; for this reason the concern is to create files that are manageable in size over the Internet, rather than of high quality [note 1]. At present, the graphic files are limited to a maximum of approximately 80K, and many are significantly smaller; larger files are accessed through "thumnails" that take only a few seconds to load. Sound and video are more complex. The solution there will probably be to provide two kinds of files: highly compressed files of lower quality for modem connections, and higher quality, larger files, for fast connections. This solution is already being tested in the about-to-be-made-public section of the site on Shakespeare's Life and Times (test the fanfare on the opening page for a sample).


Managing a potentially enormous range of data

Copyright restrictions notwithstanding, there is no reason why in the next few years the ISE performance database should not include a significant sample of plays, with multiple performances of the more often performed. And as the editions are completed, additional archival materials will certainly be added as part of the overview of the history of performance of the individual plays.

Fast forward for a moment to the time when the site offers varied materials from an extensive range of performances. The simplest query you might have would be to compare the way a given scene from a particular play is presented. But you might be interested in looking the records in a number of different ways:

-- and there are no doubt a number of ways that haven't yet occurred to me.

The solution to the variety of needs of users will come from a combination of careful planning for the development of appropriate keywords for searching, and the use of appropriate software -- an advanced relational database. Databases are increasingly being integrated into the Internet, with pages developed "on the fly" for each user according to the choices he or she has entered. Those of you who have explored the admirable Perseus Project will have some idea of the ways this kind of structure can be made attractive and manageable.

Again, this is not the place for a detailed discussion of the kinds of software development that will be needed to make the site as useful as its potential suggests. It is sufficient to note that the technology is now available. Within the next few years, sites like the Internet Shakespeare editions will both collect and connect in ways that become less rather than more confusing, where the threatening mazèd world will evolve to something both less daunting and more fertile for the imagination.



  1. Typically, the film clips on the sample site at MIT, "Hamlet on the Ramparts," are over 4 Mb in size, prohibitively large for all but the fastest connection. "Streaming" video is a possible alternative, but is less attractive for the very reason that it does not permit the kind of stop/start/rewind analysis that is required for critical film analysis. [Back]