Text Creation Partnership

Text Creation Partnership

The Text Creation Partnership (TCP) is a not-for-profit organization based in the library of the University of Michigan since 2000. Its purpose is to produce large-scale full-text electronic resources (especially in the humanities) on behalf of both member institutions (particularly academic libraries) and scholarly publishers, under an arrangement calculated to serve the needs of both, and in so doing to demonstrate the value of a business model that sees corporate and non-profit information-providers as potentially amicable collaborators rather than as antagonistic vendors and customers respectively. [For an overview see cite journal
last = Blumenstyk
first = Goldie
authorlink = Goldie Blumenstyk
title = A Project Seeks to Digitize Thousands of Early English Texts
journal = Chronicle of Higher Education
pages = A47
date = August 10, 2001
url = http://www.lib.umich.edu/tcp/eebo/archive/chronicle.htm
accessdate = 2007-01-04
]

TCP sponsors three text-creation projects at this time (October 2008). The first and the largest is "EEBO-TCP" (2001- ), an effort to produce structurally marked-up full-text transcriptions of 25,000 of the roughly 125,000 books to be found either in the Pollard and Redgrave and Wing short-title catalogues of early English printed books, or among the Thomason Tracts, that is, from among nearly all books, pamphlets, and broadsides published in English or in England before 1700. The books are selected and transcribed from the digital scans produced by ProQuest Information and Learning, and distributed by them as a web-based product under the name "Early English Books Online" (EEBO). The scans from which the texts are transcribed were themselves made from the microfilm copies made over the years by ProQuest and its antecedent companies, including the original University Microfilms, Inc. [cite journal
last = Beamish
first = Rita
authorlink = Rita Beamish
title = Online Archive Will Preserve Earliest English Books
journal = New York Times
date = July 29, 1999
url = http://www.nytimes.com/library/tech/99/07/circuits/articles/29engl.html
accessdate = 2007-01-04
] EEBO-TCP has so far produced e-texts of about 22,250 books. With the original goal of 25,000 titles in sight, efforts are underway to fund a sequel project (EEBO-TCP(2)), with the goal of converting all the remaining unique English-language monographs (roughly 44,000 additional titles).

The second TCP project was and is Evans-TCP (2003- ), an effort to transcribe 6,000 of the 36,000 pre-1800 titles listed in Charles Evans' "American Bibliography," and distributed, again as page images scanned from microfilm copies, by Readex, a division of NewsBank under the name "Archive of Americana" ("Early American Imprints, series I: Evans, 1639-1800"). Evans-TCP has so far produced e-texts of more than 3,700 books.

The most recent TCP project is ECCO-TCP (2005- ), an effort to transcribe 10,000 eighteenth-century books from among the 136,000 titles available in Thomson-Gale's web-based resource, "Eighteenth-Century Collections Online" (ECCO). ECCO-TCP has so far produced about 1,500 books in e-text form.

Organization

The TCP is overseen by a Board of Directors, drawn chiefly from senior library administrators at partner institutions, representatives of the corporate partners, and the Council on Library and Information Resources (CLIR). The Board is assisted in matters of selection and scholarship by an academic advisory group that includes faculty in the fields of early modern English and American studies.

The TCP has informal ties to a number of University-based scholarly text projects, especially in helping to provide them with source texts with which to work. Institutions represented include Northwestern University (IL), Oxford University (UK), Washington University (St. Louis), the University of Sydney (Australia), the University of Toronto (ON), and the University of Victoria (BC). TCP has also worked with students by sponsoring an Undergraduate Essay Contest every year, convening task forces on the uses of TCP texts in pedagogy, and appealing to scholars and students for ideas on selection and use.

Text production is managed through the University of Michigan's Digital Library Production Service (DLPS), with its extensive experience in the production of SGML/XML-encoded electronic texts. DLPS is assisted by Oxford University Library's Systems and Electronic Resources Service (SERS). Small part-time production operations have also been started within two other libraries: the Centre for Reformation and Renaissance Studies in Pratt Library (Victoria University in the University of Toronto) -- specializing in Latin books --, and the National Library of Wales (Llyfrgell Genedlaethol Cymru) in Aberystwyth -- specializing, naturally, in Welsh books.

Commonalities

All three current TCP text projects are very similar. In each case:

# The TCP produces text from commercial image files that have in turn been created from microfilm copies of early books.
# The commercial image providers receive what is in effect a full-text index to their image product for much less than it would cost to produce themselves: value added to their product.
# The partner libraries actually own, rather than simply license, the resultant texts, and are free (subject to some conditions) to mount the texts themselves in whatever system they like, or use the texts internally as a tool of scholarship and teaching.
# The texts are created according to library-determined standards, uniform across multiple data-sets and potentially cross-searchable.
# Because they are created collaboratively, the texts are relatively inexpensive (on a per-book basis) and become more so with each library that joins the partnership.
# The texts will eventually be made freely accessible to the public at large.
# The selection of texts to convert, though differing from project to project, in each case follows similar principles: variety, significance, representative quality, avoidance of duplication; specific requests from faculty or scholarly initiatives at member institutions are also generally honored.
# TCP has been hitherto primarily interested in creating texts, not in creating a "product"; though texts from all three projects are or will be mounted on servers at the University of Michigan library, the Michigan site is not the official TCP site: any partner library with adequate resources and safeguards may do the same. EEBO-TCP texts, for example, are served by Michigan, ProQuest, the Oxford University Digital Library, and the University of Chicago.

tandards

All three current TCP text projects are produced in the same way and to the same standards, which are documented, at least in part, on the [http://www-personal.umich.edu/~pfs/dox/www.lib.umich.edu/tcp/docs/ TCP web site] .

# Accuracy. The TCP strives to produce texts that are as accurately transcribed as possible, with a specified overall accuracy rate of 99.995% or better (i.e. one error or fewer per 20,000 characters).
# Keying. Given the nature of the material, the only method found to deliver such accuracy economically has been to have the books keyed by data conversion firms under contract.
# Quality control. Accuracy of transcription and aptness of markup are assessed in all cases by a group of library-based proofers and reviewers managed by the University of Michigan DLPS.
# Encoding. All resultant text files are marked up in valid SGML or XML (SGML is archived, XML is exported) conforming to a proprietary "Document Type Description" (DTD) derived from the P3/P4 version of the Text Encoding Initiative (TEI) standard.
# Purposeful markup. Compared to the full TEI, the TCP DTD is very simple and intended to capture only the features most useful for intelligible display, intelligent navigation, and productive searching. The TCP practice is to capture, so far as feasible, the overall hierarchical structure of each book (parts, sections, chapters, etc.); the features that tend to mark the beginnings and ends of divisions (headings, explicits, salutations, valedictions, datelines, bylines, epigraphs, etc.); the most significant elements of discourse and organization (paragraphs in prose, lines and stanzas in verse, speeches, speakers, and stage directions in drama, notes, block quotes, sequential numerations of all kinds); and only the most essential aspects of physical formatting (page breaks, lists, tables, font changes).
# Fidelity to the original. In each case, the text is intended to represent the book as originally printed, so far as that is possible. Printer's errors are preserved, hand-written changes are ignored, duplicate scans are omitted, out-of-order images are keyed in the intended order, and most of the unusual characters of the original are preserved.
# Ease of reading and searching. At the same time, though the transcriptions are carried out character-by-character, TCP, on the theory that all transcription is a kind of translation from one symbolic system to another, tends to define characters in terms more of their meaning than of their form, and to map eccentric letter-forms to meaningful modern equivalents, generally in keeping with the Unicode definition of "character."
# Languages. Though most of the TCP texts are in English, many are not. Books and divisions of books not in English are tagged with an appropriate language code, but are not otherwise distinguished.
# Omitted material. The TCP produces Latin-alphabet "text". Non-textual material such as musical notation, mathematical formulae, and illustrations (except for any text they may contain) are omitted and their locations marked with a special tag. Extended text in non-Latin alphabets (Greek, Hebrew, Persian, etc.) is also omitted.

Accomplishments and prospects

As of October 2008, the TCP had created about 27,500 searchable, navigable, full-text transcriptions of early books, a database of unmatched scope, scale, and utility to students in many fields. Whether it will be able to go on to produce the remaining 57,000 texts included in its ambitious recent plans will depend on the validity of its original vision, arising from the theory that libraries could and should cooperate to become producers and standard-setters rather than consumers; and that universities and commercial firms, despite their very different life-cycles, constraints, and motives, could join in durable partnerships of benefit to all parties.

References

External links

* [http://www.lib.umich.edu/tcp/ Main (Michigan) TCP web site]
* [http://www.odl.ox.ac.uk/eebo/ Oxford TCP web site]
* [http://www.lib.umich.edu/tcp/docs/ Internal TCP documentation]
* Demonstration sites (open to the public) for
** [http://www.hti.umich.edu/e/eebodemo/ EEBO-TCP]
** [http://www.hti.umich.edu/e/eccodemo/ ECCO-TCP]
** [http://www.hti.umich.edu/e/evansdemo/ Evans-TCP]
* Database-access sites (open to members of partner institutions) for
** EEBO-TCP at
*** [http://ets.umdl.umich.edu/e/eebo/ the University of Michigan (via DLXS)]
*** [http://www.lib.uchicago.edu/efts/EEBO/ the University of Chicago (via PhiloLogic)]
*** [http://dlxs.odl.ox.ac.uk/e/eebo/ Oxford University (via DLXS)]
*** [http://eebo.chadwyck.com/home the ProQuest EEBO site.]
** Evans-TCP at [http://ets.umdl.umich.edu/e/evans/ the University of Michigan (via DLXS)] .
** ECCO-TCP at [http://ets.umdl.umich.edu/e/ecco/ the University of Michigan (via DLXS)] .


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • University of Michigan Library — Location Ann Arbor, Michigan Collection Size 9.55 million volumes[1] Website …   Wikipedia

  • List of digital library projects — This is a list of projects related to digital libraries.General collections* AccessMyLibrary * AJOL African Journals OnLine free multidisciplinary database of peer reviewed, African published academic journals. * Arts and Humanities Data Service… …   Wikipedia

  • John Pory — Infobox Writer name = John Pory birthdate = 1572 birthplace = England deathdate = Death year and age|1636|1572 deathplace = England occupation = Government administrator, traveller, author, journalist nationality = English period = 1600–1636… …   Wikipedia

  • Andrew Kuster — Andrew Thomas Kuster is an American conductor, musical scholar, and performer. He works as a staff editor for the Kurt Weill Foundation in New York City. Biography Andrew Kuster (b. 1969) was born in Madison, Wisconsin, the son of Thomas and Judy …   Wikipedia

  • Mad as a March hare — To be as mad as a March hare is an English idiomatic phrase derived from the observed antics, said to occur (incorrectly) cite web | url = http://www.bbc.co.uk/radio4/science/sharedearth 20070209.shtml | title = Dylan Winter – Irish Hare |… …   Wikipedia

  • Elys, Edmund — (fl. 1707)    Born at Haccombe, Devon, the son of a clergyman, he was educated at Exeter and Balliol College, Oxford, with an M.A. in 1658. In 1659 he succeeded his father to the rectory of East Allington, Devon. During the Civil Wars he was… …   British and Irish poets

  • Digital Library Production Service — The University of Michigan Digital Library Production Service [ [http://www.lib.umich.edu/lit/dlps U M Digital Library Production Service] ] (DLPS) is a department of the University of Michigan University Library. It is responsible for digitizing …   Wikipedia

  • Computers and Information Systems — ▪ 2009 Introduction Smartphone: The New Computer.       The market for the smartphone in reality a handheld computer for Web browsing, e mail, music, and video that was integrated with a cellular telephone continued to grow in 2008. According to… …   Universalium

  • performing arts — arts or skills that require public performance, as acting, singing, or dancing. [1945 50] * * * ▪ 2009 Introduction Music Classical.       The last vestiges of the Cold War seemed to thaw for a moment on Feb. 26, 2008, when the unfamiliar strains …   Universalium

  • religion — religionless, adj. /ri lij euhn/, n. 1. a set of beliefs concerning the cause, nature, and purpose of the universe, esp. when considered as the creation of a superhuman agency or agencies, usually involving devotional and ritual observances, and… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”