Wordcrunching

Silver Currents, page 3

A further article in the EMLS issue is a reminder that search engines are not the only way that computers can be used in reading texts. Jonathan Hope and Michael Witmore write about their experience with applying Docuscope, a text analysis program being developed at Carnegie Mellon University, to Shakespeare's plays. The result of the program's analysis was unexpected: when it was, in effect, asked to decide which plays, according to its tests, were most alike, it created lists that come very close to the division of the plays by genre as categorized in the First Folio. Hope and Whitmore comment, "a program which knows nothing about the folio divisions or the largely content-based definitions of genres, has been able to reproduce them solely by counting the relative frequencies of linguistic items."

I mention this study here as a way of pointing out the possibilities that the computer provides of developing tools for understanding text that are very different from those we normally associate with published editions of the plays. Few critics are familiar enough with text analysis software to come up with results like those of Hope and Whitmore, but we might remember the work done by Ian Lancashire and others using the early text analysis program TACT.[Note 8] The field has been complicated by difficulty in reproducing the work of Donald Foster, who suffered the problems faced by many pioneers,[Note 9] and the tendency of much textual analysis to be devoted to issues of authorship rather than critical enquiry. The work of scholars like Lancashire, Hope, and Whitmore suggests that textual analysis tools may be applied to questions of more interest to most Shakespeareans. In the development of useful analytic tools, the ISE is especially fortunate, since our University has recently become part of a consortium of six universities in Canada to receive major funding to establish a Text Analysis Portal for Research (a project known to its friends as TAPoR). ISE texts have been donated to the portal to provide an example of intelligently tagged text for the various varieties of software to work on.

One way I envision this interaction working would be for a user to be able to select a word or phrase in the hypertext edition, then drag it to a text box within the browser window. At this point there would be options that would allow him or her to view the word as used elsewhere in the play, or in the Shakespeare corpus as a whole (a concordancing function), or in related Renaissance texts -- Shakespeare's sources, the Bible, the Elizabethan Homilies (Lancashire 1994, 1997), for example -- all in KWIC view (key word in context) where a further click will reveal the full passage where the word occurs. Or the user might look up the word in such reference works as the Early Modern English Dictionaries Database (Lancashire 1999) or other freely available online reference works. More sophisticated explorations will also be possible, as words, phrases, and collocations can be traced through frequency distributions, through lemmatised versions of the text, and so on.[Note 10]

top

Notes

[8] See, for example Lancashire 1996. [Back]

[9] Foster has made the data for Shaxicon available to the ISE in the hope that it will be possible to reformat it in a fashion that will make it useful for today's scholars, and allow his original findings either to be confirmed or modified. [Back]

[10] A related project being undertaken by Ray Siemens is the development of a dynamic edition of the Devonshire MS -- a fascinatingly complex text -- where all links will be created automatically by algorithms rather than as fixed links chosen by the editor. He will integrate into this edition the text analysis tools being developed at the University of Alberta by StØfan Sinclair, HyperPo; through a similar collaboration it will be possible to make the same kinds of tools available for dynamic readings of ISE texts. [Back]

Internet Shakespeare Editions

Reading Room

Toolbox

Wordcrunching

Notes