Internet Shakespeare Editions

APPENDIX: Tags

The objective of this tagging scheme is to make the task of creating effectively tagged texts that are at the same time reasonably readable. The tags are based on the similar conventions in HTML; at a later stage the files will be converted to both XML and HTML versions automatically. Editors will be provided with "base" texts with much of the tagging in place.

A 1. Renaissance texts: Folio, quartos

This tag set is derived originally from the "Encoding Guidelines" for the Renaissance Electronic Texts, developed by Ian Lancashire. The XML version was developed by Peter van Hardenberg.

Please note that all tags supplied in the base texts should be carefully proofread.

Where an editor wishes to add additional information not anticipated by these tags, he or she should correspond with the Coordinating Editor to see if the tagset should be expanded. In general, however, it will be sufficient to add the additional information in the form of a comment, using the SGML/HTML convention of the exclamation mark followed by two hyphens, thus:

<!-- NOTE: there is a printing ambiguity in the Hinman facsimile. The previous word may be "quallitie" or "quallirie". -->

The comment is closed by two hyphens before the final angle bracket.

A 1.1. Header

Each document will have a "header" providing metadata about the document as a whole. The structure of the metadata conforms to the Dublin Core standard, with some additional items developed specifically for ISE texts. Metadata describe in detail the provenance of the text, the names of those involved in developing it, and a list of tags and abbreviations used. The header will be compiled by the Coordinating Editor from information supplied by the editor.

There is a complete description of the metadata on this page.

A 1.2. Structural elements

<WORK> </WORK> The XML document root for all texts.
<ACT n="[number]"> </ACT> Act division as in the modern edition.
<SCENE n="[number]"> </SCENE> Scene division as in the modern edition.
Note: In plays with a non-diegetic chorus, like Henry V (F) and Pericles, use <SCENE n="0"> to designate chorus passages that are not prologues or epilogues. SCENEs may be used with or without enclosing ACTs
<TITLE> </TITLE> Initial heading in Folio; the principal title of the play on Quarto title pages.
  Note: for additional large-type lines on title pages, use the <FONTGROUP> tag (see below)
<LD> </LD> Literary Division (e.g. Act, Scene.). Note that this tag encloses original literary divisions; modern divisions are indicated by separate tags (see 1.4 below)
<S> </S> Speech. Includes the speech prefix and any included stage directions (</S> may appear at the end of a hung word).
<SP> </SP> Speaker Prefix.
norm="Name" Normalized form of the name; must be included in every instance of a speech prefix, but not if the name is used in the course of a speech.
Example: <SP norm="Hamlet">Ham.</SP>.
Where there are more than one speaker, separate the normalized names by commas; if there is a speech without a speech prefix, put<SP norm="normalized name"></SP> to indicateits omission.
<SD> </SD> Stage Direction. Each line of a split direction in the right hand margin should be tagged separately; directions different in kind should also be tagged separately.
t="[type]" entrance | exit | setting | sound | delivery | whoto | action | location | other | optional | uncertain
Example: <SD t="exit"> <I>Exeunt.</I></SD> <SD t=sound><I>Alarum</I></SD>. Of course there will be single instructions that include more than one kind of direction, in which case you should separate the types with a comma: <SD t="entrance, setting"><I>Enter Macbeths Wife alone with a Letter</I>.</SD>.
<VERSEQUOTE> </VERSEQUOTE> Verse quotation (e.g. song).
source= When the verse is a quotation from another source, the source should be recorded (as in Pistol's quotations from earlier plays). For matters concerning display, see the tags <SPACE> and <INDENT>.
<PROSEQUOTE> </PROSEQUOTE> Prose quotation (e.g. quotedletter).
source= (As for verse.)
<MODE> </MODE> Indicator of verse or prose.
t="[mode]" prose | verse | uncertain
Example: <MODE t="prose"> . . . </MODE>.
Note that you may choose to use a type of "uncertain" where it is not clear that the section is either verse or prose.
<FOREIGN> </FOREIGN> Language when not English. Used for the content of speeches only, not Latin stage directions, literary divisions etc.
lang="language" Example <FOREIGN lang="French">Diable!</FOREIGN>.

A 1.3. Printing elements

a) Page

<PAGE n="[number]"> </PAGE> defines the extent of a printed page. n gives the page number.
<SIG> </SIG> Page Signature (encloses the printed signature, and appears at the end of the page, where it will be displayed). Note that the signature itself may include other tags where necessary (if, for example, an italic letter is used).
n="[signature]" The signature in its normalized and accurate form. This will include all signatures that are implied rather than actually printed, and must include either "r" or "v" for recto and verso.
Examples: <SIG n="A2v"></SIG><BR> <SIG n="aaa1r"><LS>aaa</LS></SIG>
<CW> </CW> Catchword.
<RULE/> Rule. (Note that the final forward slash is required to "close" the tag.)
<RT> </RT> Running title.
<PN> </PN> Page number as printed.
<COL n="0"></COL> Defines pages with no columns in a document that elsewhere has columns; where there are columns, indicates a print element that spans both columns.
<COL n="1"></COL> Column 1 (Folio). Placed at the beginning of the column.
<COL n="2"></COL> Column 2 (Folio).
<CL> </CL> Closing (e.g. Finis).

b) Typesetting

<I> </I> Italic text. Italics may also be generated in the normal word processor way to assist you in working with a readable text; they will later be converted automatically.
Note: intermediate spaces should be italicized.
<BLL> </BLL> Black Letter.
Note: for texts basically in black letter, use the metadata tag <META name="ISE.DefaultFont" content="BLL"/>.
<R> </R> Roman text. Note: this tag will only be used in texts for which black letter is the default.
<LS> </LS> Letter-Spaced (e.g., "G O D"). Note: do not include actual spaces in the word to be letter-spaced: <LS>GOD</LS> is correct.
<FONTGROUP> </FONTGROUP> Can be used to indicate different sizes of type.
n="1 . . . 6"  where n="1" is the smallest size, n="3" is normal type, and n="6" is the largest
<SUP> </SUP> Superscript characters (this follows the HTML 3.2 convention).
<SUB> </SUB> Subscript characters.
<J> </J> Justified line(s). Only fully justified lines are tagged. Note that verse lines that reach to the end of the column should not be tagged as justified (though many draft texts do), since these are not justified in the way that prose lines are.
<HW> </HW> Hung Word(s). Note that the hung word should be restored to the line it continues; the "type" indicates whether it was originally displaced to the previous or next line.
t="prev | next" The type of hung word indicates whether it appears on the previous or next line from the line it continues.
<C> </C> Centered text. As in HTML this tag applies to a whole line. Each centered line should be tagged.
<RA> </RA> Right Aligned text. This tag can be applied to a separate part of a line, so is the equivalent of a tab rather than right alignment for the whole line.
<ORNAMENT/> Ornament (will be shown by a graphic in the HTML version).
<L/> Blank line. (Previously tagged <BL> or <L></L>.)
Line breaks Line breaks will be indicated in the normal way by a carriage return, so lines should not be broken except where they are in the original. The one exception is where a tag is the only item on the line (<COL n="1"> is an example). In the final version of the texts the line breaks will be replaced by appropriate tags automatically.

c) Abbreviations

Abbreviations in the old-spelling texts are tagged in a form that indicates the full, or expanded, version of the word as part of the tagging. The basic tag is this:

<ABBR expan="[full word]">[abbreviation]</ABBR>

Example: where the original reads "My L. Mayor"

My <ABBR expan="Lord">L.</ABBR> Mayor

There are two kinds of abbreviations that involve early type forms.

a) The instance where a single character is involved. This is found in the instances where the letter "y" with a small superscript above it represents either "the" or "that." This single character should be tagged thus:

<ABBR expan="the">y<SUP>e</SUP></ABBR>
<ABBR expan="that">y<SUP>t</SUP></ABBR>

A similar case is the abbreviation for "which" with a small "c" over the "w":

<ABBR expan="which">w<SUP>c</SUP></ABBR>

b) The instance where there is a small superscript character as an additional type, as in <ABBR expan="Master">M<SUP>r</SUP></ABBR>.

d) Characters and ligatures

{-} Hyphen at end of line (soft hyphen).
{s} Long s.
{P} Paragraphus (¶).
{sm} Section mark (§).
{^o} letter with circumflex (ô).
{"o} letter with dieresis (ü).
{'e} letter with acute accent (é).
{`e} letter with grave accent (è).
{_m} letter with macron accent.
{~n} letter with tilde accent (ñ).
{ae} digraph.
{oe} digraph.
ligatures. {oe} {ae} {as} {ct} {ee} {ffi} {ffl} {ff} {fi} {fl} {fr} {ij} {is} {oe} {oo} {pp} {us} {st}
ligatures with long s. {s} {{s}h} {{s}i} {{s}l} {{s}p} {{s}t} {{s}{s}i} {{s}{s}l} {{s}{s}}
vv used for w {w}
VV used for W {W}

Note: where an accent is also an abbreviation, the abbreviation should be indicated thus: |m{_a}|

e) Word spacing

Word spacing will be normalized throughout, with a single space separating all words. Add a space when the original uses punctuation as a word separator. In cases where there is doubt as to whether a space is intended (a word that could be a compound, for example, or where words are clearly run on without a space), indicate the doubtful space by {#}. NOTE: editors may choose to indicate all added spaces if they wish. The contrary case, where a space is added where there should be none (as in "now th ou art") is indicated by { } (brackets enclosing a single space).

f) Indents and significant spaces

<SPACE n="[number]"/> Indicates significant space to be left in the text. The most common instance of this will be in formatting the lines of verse in a song or sonnet, where some lines will be indented further than others. The number of m-spaces should be indicated. There is no </SPACE> tag (the forward slash at the end of the tag is the equivalent of a closing tag). In the modern text, prose and verse will automatically be indented when the <PROSEQUOTE> or <VERSEQUOTE> tag is used
<INDENT n="[number]"></INDENT> In the old-spelling transcription, indicates indentation for a whole block of text (prose or verse)
n= The number of m spaces indented. Further indentation in verse should be shown by the use of the <SPACE n="[number]"> where again n is the number of m spaces.

A 1.4. References and modern act, scene divisions

<TLN n="[number]"/> Through Line Number. The basic method of internal reference for the editions will be the TLN number. Where a quarto or modern edition omits material the numbers will be omitted; where they add material the numbers will be added decimally (<TLN n="1033.1"> etc.); where the line division varies from the Folio the TLN number will be that of the first word of the line.
<ACT> </ACT> Act division as in the modern edition.
n="[number]"> The number of the act. If the original includes an act division which has been retained, the notation would be thus: </ACT><ACT n="2"> <LD>Actus Secundus</LD>.
<SCENE> </SCENE> Scene division as in the modern edition.
n="[number]"> Example: </SCENE>
</ACT><ACT n="2">
<LD>A{ct}us Secundus, Sc{oe}na Prima</LD>
<SCENE n="1">
<SD><I>Enter Hamlet</I>.</SD> . . .

A 1.5. Multiple tags and hierarchical structures

The tags in the Renaissance texts of the Internet Shakespeare Editions are on the whole not hierarchical, since they are representational rather than logical. Thus one tag can cross the boundary of another where necessary. Nonetheless, it is good manners to keep the boundaries logical and consistent wherever possible. In this example the overall <S> </S> tag encloses both the speech prefix (which is logically necessary) and the tag that indicates a justified line. In turn the <J> </J> tag encloses the speech prefix tag.

<TLN n="452"/><S><J><SP><I>Nurse</I>.</SP>
Goe Gyrle, seeke happie nights to happy daies.</J></S>

In the next passage the hung word complicates the process, since it also ends a speech.

<TLN n="19"/><S><J><SP><I>Samp</I>.</SP> True, and therefore women being the weaker</J>
<TLN n="20"/><J>Vessels, are euer thrust to the wall: therefore I will push</J>
<TLN n="21"/><J><I>Mountagues</I> men from the wall, and thrust his Maides to</J>
<TLN n="22"/>the wall</S>
<TLN n="23"/><S><J><SP><I>Greg</I>.</SP> The Quarrell is betweene our Masters, and vs</J> <HW t="prev"><RA>(their men.</RA><HW></S>
<TLN n="24"/><S><J><SP><I>Samp</I>.</SP> 'Tis all one, I will shew my selfe a tyrant: when</J>
<TLN n="25"/><J>I haue fought with the men, I will bee ciuill with the</J>
<TLN n="26"/>Maids, and cut off their heads.</S>

This example includes a stage direction divided between two lines, right aligned.

<TLN n="721"/>And euery Greeke of mettell let him know,
<TLN n="722"/>What Troy meanes fairely, shall be spoke alowd. <SD><RA><I>Sound</I> <HW t="next"><I>trumpet</I>.</HW></RA></SD>
<TLN n="724"/>We haue great <I>Agamemnon</I> heere in Troy,
<TLN n="725"/>A Prince calld <I>Hector</I>, <I>Priam</I> is his father,

Where there is a combination of tags indicating logical structures (speech, stage direction etc.) and physical characteristics (justification etc.), the logical tags should wherever possible surround the physical tags, as in the examples above.

Previous | Table of Contents | Next