From the Print Media to the Internet - Part 7
Library

Part 7

There are literally thousands of digital library initiatives of a great many varieties going on in the world today. Digital libraries are being formed of scholarly works, archives of historical figures and events, corporate and governmental records, museum collections and religious collections. Some take the form of scanning and putting doc.u.ments to the World Wide Web. Still other digital libraries are formed of digitizing paintings, films and music. Work even exists in 3D reconstructive digitization that permits a digital deconstruction, storage, transmission, and reconstruction of solid object."

The British Library is a pioneer in Europe for research relating to digital libraries. Some treasures of the library are already on-line: Beowulf, the first great English masterpiece dated 11th century; Magna Carta, one example from 1215 issued over the Great Seal of King John; the Lindisfarne Gospels, dated 698; the Diamond Sutra, dated 868, which is the world's earliest printed book; the Sforza Hours, dated 1490-1520, which is an outstanding Renaissance treasure; the Codex Arundel, a notebook of Leonardo Da Vinci (1452-1519), and the Tyndale New Testament, which was the first printed New Testament in English, from the press of Peter Schoeffer in Worms.

Brian Lang, Chief Executive of the British Library, states on the British Library website:

"We do not envisage an exclusively digital library. We are aware that some people feel that digital materials will predominate in libraries of the future.

Others antic.i.p.ate that the impact will be slight. In the context of the British Library, printed books, ma.n.u.scripts, maps, music, sound recordings and all the other existing materials in the collection will always retain their central importance, and we are committed to continuing to provide, and to improve, access to these in our reading rooms. The importance of digital materials will, however, increase. We recognize that network infrastructure is at present most strongly developed in the higher education sector, but there are signs that similar facilities will also be available elsewhere, particularly in the industrial and commercial sector, and for public libraries. Our vision of network access encompa.s.ses all these."

The Digital Library Programme will begin in February 1999. The two potential partners are: Dawson-IBM-The Stationery Office Consortium, and the Digital Library Consortium (Blackwell, Chadwyck-Healey, MicroPatent, Unisys). The confirmation of the preferred bidder is planned for February 1999, and the contract will be awarded in Spring 1999.

"The development of the Digital Library will enable the British Library to embrace the digital information age. Digital technology will be used to preserve and extend the Library's unparalleled collection. Access to the collection will become boundless with users from all over the world, at any time, having simple, fast access to digitized materials using computer networks, particularly the Internet."

What exactly is digitization? Digitization is the conversion of text, sound or images to digital form, that is, in the form of numerical digits (bits and bytes) for handling by computer. Digitization has made it possible to create, record, manipulate, combine, store, retrieve and transmit information and information-based products in ways which magnetic tape, celluloid and paper did not permit. Digitization thus allows music, cinema and the written word to be recorded and transformed through similar processes and without separate material supports. Previously dissimilar industries, such as publishing and sound recording, now both produce CD-ROMs, rather than simply books and records.

7.2. Digital Libraries: Some Examples

Created by Michael S. Hart in 1971, the Project Gutenberg was the first information provider on the Internet. It is now the oldest digital library on the Web, and the biggest in terms of the number of works (1,500) which have been digitized for it, with around 45 new t.i.tles per month. Michael Hart's purpose is to put on the Web as many literary texts as possible for a minimal price.

In his e-mail of August 23, 1998, Michael Hart explained:

"We consider Etext to be a new medium, with no real relationship to paper, other than presenting the same material, but I don't see how paper can possibly compete once people each find their own comfortable way to Etexts, especially in schools. [...] My own personal goal is to put 10,000 Etexts on the Net, and if I can get some major support, I would like to expand that to 1,000,000 and to also expand our potential audience for the average Etext from 1.x% of the world population to over 10%... thus changing our goal from giving away 1,000,000,000,000 Etexts to 1,000 time as many... a trillion and a quadrillion in US terminology."

The Etext # 1000 was Dante's Divine Comedy, in both English and Italian, and Michael Hart dreams about Etext # 2000 for January 1st, 2000. In the Project Gutenberg Newsletter of February 1998, he wrote: "If we do 36 per month for the next 23 month period, we should be able to reach 2,000 Etexts by January 1 of the year 2000. . . [...] I think it would be kind of nice to do our 2,000th Etext during the big celebration..."

An average of 50 hours is necessary to get any Etext selected, entered, proofread, edited, copyright-searched, a.n.a.lyzed, etc.

How did Project Gutenberg begin?

Project Gutenberg began in 1971 when Michael Hart was given an operator's account with $100,000,000 of computer time in it by the operators of the Xerox Sigma V mainframe at the Materials Research Lab at the University of Illinois.

Michael decided there was nothing he could do, in the way of "normal computing", that would repay the huge value of the computer time he had been given... so he had to create $100,000,000 worth of value in some other manner. He immediately announced that the greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries. He then proceeded to type in the Declaration of Independence and tried to send it to everyone on the networks. Project Gutenberg was born.

There are three sections in the Project Gutenberg, basically described as:

- Light Literature; such as Alice in Wonderland, Through the Looking-Gla.s.s, Peter Pan, Aesop's Fables, etc.;

- Heavy Literature; such as the Bible or other religious doc.u.ments, Shakespeare, Moby d.i.c.k, Paradise Lost, etc.; and

- References; such as Roget's Thesaurus, almanacs, and a set of encyclopedia, dictionaries, etc.

"The Light Literature Collection is designed to get persons to the computer in the first place, whether the person may be a pre-schooler or a great-grandparent. We love it when we hear about kids or grandparents taking each other to an Etext to Peter Pan when they come back from watching Hook at the movies, or when they read Alice in Wonderland after seeing it on TV. We have also been told that nearly every Star Trek movie has quoted current Project Gutenberg Etext releases (from Moby d.i.c.k in The Wrath of Kahn; a Peter Pan quote finishing up the most recent, etc.) not to mention a reference to Through the Looking-Gla.s.s in JFK. This was a primary concern when we chose the books for our libraries.

We want people to be able to look up quotations they heard in conversation, movies, music, other books, easily with a library containing all these quotations in an easy to find Etext format.

With Plain Vanilla ASCII you will be easily able to search an entire library, without any program more sophisticated than a plain search program. In fact, these Project Gutenberg Etext files are so plain that you can do a search on them without even using an intermediate search program (i.e. a program between you and the disk). Norton's and other direct disk access programs can search every one of your files without you even naming them, pointing to an Etext directory, or whatever. You can simply search a raw output from the disk. . .I do this on a half gigabyte disk part.i.tion, containing all our editions."

In this same spirit, Project Gutenberg selects Etexts that large portions of the audience will want and use frequently. It has also avoided requests, demands, and pressures to create authoritative editions.

"We do not write for the reader who cares whether a certain phrase in Shakespeare has a ':' or a ';' between its clauses. We put our sights on a goal to release Etexts that are 99.9% accurate in the eyes of the general reader.

Given the preferences our proofreaders have, and the general lack of reading ability the public is currently reported to have, we probably exceed those requirements by a significant amount. However, for the person who wants an 'authoritative edition' we will have to wait some time until this becomes more feasible. We do, however, intend to release many editions of Shakespeare and the other cla.s.sics for comparative study on a scholarly level, before the end of the year 2001, when we are scheduled to complete our 10,000 book Project Gutenberg Electronic Public Library."

"Anything that can be entered into a computer can be reproduced indefinitely."

The Project Gutenberg Philosophy uses this premise to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search.

Project Gutenberg Etexts are made available in what has become known as 'Plain Vanilla ASCII', meaning the low set of the American Standard Code for Information Interchange (ASCII). The reason for this is that 99% of the hardware and software a person is likely to run into can read and search these files."

Plain Vanilla ASCII thus addresses the audience with Apples and Ataris all the way to the old homebrew Z80 computers, not to mention the audience of Mac, UNIX and mainframers. Michael Hart explains:

"When we started, the files had to be very small .... So doing the U.S.

Declaration of Independence (only 5K) seemed the best place to start. This was followed by the Bill of Rights - then the whole U.S. Const.i.tution, as s.p.a.ce was getting large (at least by the standards of 1973). Then came the Bible, as individual books of the Bible were not that large, then Shakespeare (a play at a time), and then into general work in the areas of light and heavy literature and references...By the time Project Gutenberg got famous, the standard was 360K disks, so we did books such as Alice in Wonderland or Peter Pan because they could fit on one disk. Now 1.44 is the standard disk and ZIP is the standard compression; the practical file size is about three million characters, more than long enough for the average book.

However, pictures are still so bulky to store on disk that it will still be a while before we include even the lowres Tenniel ill.u.s.trations in Alice and Looking-Gla.s.s. However we are very interested in doing them, and are only waiting for advances in technology to release a test edition. The market will have to establish some standards for graphics, however, before we can attempt to reach general audiences, at least on the graphics level."

The On-Line Books Page is a directory of books that can be freely read right on the Internet. It was founded in 1993 by John Mark Ockerbloom, a graduate student in computer science at Carnegie Mellon University, Pittsburgh, Pennsylvania, who remains the editor of the pages. It includes: an index of more than 7,000 on-line books on the Internet, which can be browsed by author, by t.i.tle or by subject; pointers to significant directories and archives of on-line texts; and special exhibits. From the main search page, users have options to search for four types of media: books, music, art, and video.

"Along with books, The On-Line Books Page is also now listing major archives of serials (such as magazines, published journals, and newspapers), as of June 1998. Serials can be at least as important as books in library research. Serials are often the first places that new research and scholarship appear. They are sources for firsthand accounts of contemporary events and commentary, They are also often the first (and sometimes the only) place that quality literature appears. (For those who might still quibble about serials being listed on a 'books page', back issues of serials are often bound and reissued as hardbound 'books'.)"

Web s.p.a.ce and computing resources are provided by the School of Computer Science at Carnegie Mellon University. The On-Line Books Page partic.i.p.ates in the Experimental Search System of the Library of Congress. It works with The Universal Library Project, also hosted at Carnegie Mellon University.

In his e-mail to me of September 2, 1998, John Mark Ockerbloom explained how the site began:

"I was the original Webmaster here at CMU CS, and started our local Web in 1993.

The local Web included pages pointing to various locally developed resources, and originally The On-Line Books Page was just one of these pages, containing pointers to some books put on-line by some of the people in our department.

(Robert Stockton had made Web versions of some of Project Gutenberg's texts.)

After a while, people started asking about books at other sites, and I noticed that a number of sites (not just Gutenberg, but also Wiretap and some other places) had books on-line, and that it would be useful to have some listing of all of them, so that you could go to one place to download or view books from all over the Net. So that's how my index got started.

I eventually gave up the Webmaster job in 1996, but kept The On-Line Books Page, since by then I'd gotten very interested in the great potential the Net had for making literature available to a wide audience. At this point there are so many books going on-line that I have a hard time keeping up (and in fact have a large backlog of books to list). But I hope to keep up my on-line books works in some form or another."

In his e-mail of September 1, 1998, he explained the way he sees the relationship between the print media and the Internet:

"I certainly find both the print media and the Internet very useful, and am very excited about the potential of the Internet as a ma.s.s communication medium in the coming years. I'd also like to stay involved, one way or another, in making books available to a wide audience for free via the Net, whether I make this explicitly part of my professional career, or whether I just do it as a spare-time volunteer."

Created by the Carnegie Mellon University, in Pittsburgh, Pennsylvania, the Universal Library Project is chaired by Raj Raddy. According to the website:

"The mission of the Universal Library Project is to start a worldwide movement to make available on the Internet all the Auth.o.r.ed Works of Mankind so that anyone can access these works from any place at any time. This is a major new initiative in digital libraries that will build a technically realistic and economically practical infrastructure for putting and accessing library doc.u.ments on the World Wide Web. In this regard, access to the Universal Library would be free and have the same stated goal as the Carnegie Library of the last century.

[It] has a vision that goes beyond the scope of most other digital library projects. Simply put, our goal is to spark a lasting movement, in which all of the inst.i.tutions responsible for the collection of mankind's works will place these works on the Internet to educate and inspire all of the world's people.

Our project will, therefore, serve as an umbrella over all of these efforts, with common indices, guidelines, and systems that allow the quickest, simplest access possible."

In summer 1998, The Universal Library was working on the Book Object project:

"The Universal Library Book Object is intended to let you read a book off the web the way you would like to read it, by giving you book presentation options.

You can either download the whole book as a single HTML or ASCII MIME object.

Download by the screen-full. Download by the section or chapter. You can have the book in HTML, in ASCII, in Postscript, in RTF, or image GIF. In short, you don't have to read the book in the same form in which it is stored on the remote server. Such conversion of original presentation format is already common in printer drivers, although we also provide a means to permission use.

To complement the users' freedom to read the book in the form in which they desire to read it, the Book Object also has complementary provisions by which a book owner can control or restrain the freedoms allowed. This includes not only presentation constraints, but also permission to print or permission that may require monetary payments. The Universal Library Book Object is still a work in progress, but we have now overcome a few of the more fundamental hurdles in establishing the question of its feasibility."

Founded in 1992 by Paul Southworth, The ETEXT Archives are home to electronic texts of all kinds, from the sacred to the profane, and from the political to the personal. Their duty is to provide electronic versions of texts without judging their content.