Internet Shakespeare Editions

Become a FriendSign in


Internet Editions of Shakespeare Principles of Tagging

Please send comments and suggestions to Michael Best, <>. Responses and suggestions related to questions concerning the principles of tagging will be posted in a discussion area from time to time. Some extracts from the general Editorial Guidelines are also available online.

Principles of tagging in HTML and SGML

The general aim of tagging should be to produce tools that are of use to scholars of various persuasions. Initially, texts will use a basic set of tags of minimal complexity; in due course scholars who are working on the text will be encouraged to make more detailed tagging schemes available. The three schemes of tagging--HTML, XML, and SGML--can conveniently be employed to reach different audiences and satisfy different scholarly needs. In general, scholars who are concerned with the critical interpretation of Shakespeare's texts will use the simpler HTML texts, since they will be interested in the words and the punctuation rather than in the processes of printing. Those interested in the computer analysis of texts require the additional information that XML makes possible, while the standard of archiving texts in the Humanities is still SGML as defined by the Text Encoding Initiative (TEI); bibliographers may also find the SGML texts a useful starting place for analysis, though they will in all probability want to add additional tags of their own.

In due course, it will be possible to provide a linked graphic version of the original texts as well as the tagged versions.

The originals

The original Quarto and Folio texts will be transcribed literally, with no emendation or correction. Modern act and scene divisions will be noted, and line numbers based on the Norton Through Line Numbers (TLN) will be added.


Since Web browsers do not conveniently display special characters like the long "s," the HTML versions of the Quartos and the Folio will aim to produce uncluttered, readable copy, suitable for a quick check on original punctuation, spelling, and layout. These texts will be useful for scholars who wish to check original readings for critical purposes, and will be a valuable teaching tool for students at the undergraduate level, since the texts will be far easier to read than the originals. Tagging will adhere to the stricter forms of HTML as published, for the sake of consistency and adaptability as browsers are modified and improved.

Links in these texts will be minimal, referring back periodically to the modern text and cross-linked at intervals to any other original texts (transcriptions and graphics). All links to notes, criticism, or other media, will be made from the modern text and its annotations.


The aim in tagging the original Quartos and the Folio in XML and SGML will be to produce texts available for further analysis; again the emphasis will be on providing basic tools for further use by individual scholars. The tagging scheme will stress the physical characteristics of the original text, and will follow a simplified version of the tag set devised by Ian Lancashire for his Renaissance Electronic Texts, and suggested in his paper on "The Public Domain Shakespeare." Unlike the HTML versions, the XML and SGML texts will indicate special characters and ligatures, and will include significant "conceptual" tags that indicate the function of the units of the text, such as stage directions and speech prefixes.

Ultimately no physical attribute is free from the interpretative intervention of the editor, but where possible interpretation will be kept to a minimum. Thus, for example, the tagging scheme will identify compositorial effects such as justified text and hanging words, but will not attempt to indicate which parts of a text may have been set by different compositors, since this is a matter of inference from the physical characteristics of the text.

Bibliographical scholars will thus be able to use the text as a starting point for their own schemes of "deeper" tagging--and will be invited to make their versions available to other scholars. In due course standard browsers will be able to include the capacity to display the tagged text in a font that approximates the original, complete with special characters.

The site does not yet include examples of XML/SGML tagging.

[Return to top]

The modern texts


Again, the aim will be to produce clean, readable copy. The HTML modern-spelling text will be linked to annotations, collation, and the original quartos and Folio. Since the originals are just a click away, the modern text will not need to use archaisms ("burthen") or to signal possible elision for the sake of scansion. The tag set will allow for software to provide at least two versions of the main modern text, one linked to shorter annotations, suitable for a student at the introductory level, the other to fully detailed scholarly apparatus.

In recent years there has been an energetic debate in the community of textual scholars on the subject of the nature of the modern text--or even whether there should be a modern text at all. Given that ISE texts will have a graphic representation of the original available for the scholar, it is structurally desirable for some kind of "finder" text to be used as the main centre for the web of annotations, collations, and further links to multimedia and performance materials. The nature of the finder text will be decided by the individual editor, in consultation with the Editorial Board.


The emerging standard of XML is making it possible for sophisticated text to be tagged and displayed on normal browsers: Internet Explorer 5.0 (Macintosh) and 5.5 (PC) and Netscape 6.0 both support XML, at least to some extent. The ISE will create its texts in standards-compliant XML and associated style sheets. XML will make possible a flexible and elegant interface for accessing much of the metadata embedded in the final ISE files.

The purpose of the modern SGML text will be to make it available both to more sophisticated browsers as these become available, and to provide a base text for analysis. Thus the tagging will more fully recognise structural and conceptual elements than the tagging for the original quartos and Folio, and will be almost completely free of tagging for physical representation. SGML tagging will comply with the standards set by the Text Encoding Initiative.

Some elements to be tagged will include:

  1. Literary divisions: act scene line (and through line numbering). Note that a scheme for line numbering in the electronic world of freely flowing text will be different from traditional lineation, especially in prose passages.
  2. Supporting text: speech headings, stage directions.
  3. Editorial decisions: departures from the copy text, identifying edition; additions and deletions.
  4. Literary types: verse or prose.

[Return to top]