Library of Congress Workshop on Etexts - Part 18
Library

Part 18

The E-library strategy projected in this plan is a visionary one that can enable major changes and improvements in academic, public, and special library service. This vision is, though, one that can be realized with today's technology. At the same time, it will challenge the political and social structure within which libraries operate: in academic libraries, the traditional emphasis on local collections, extending to accreditation issues; in public libraries, the potential of electronic branch and central libraries fully available to the public; and for special libraries, new opportunities for shared collections and networks.

The environment in which this strategic plan has been developed is, at the moment, dominated by a sense of library limits. The continued expansion and rapid growth of local academic library collections is now clearly at an end. Corporate libraries, and even law libraries, are faced with operating within a difficult economic climate, as well as with very active compet.i.tion from commercial information sources. For example, public libraries may be seen as a desirable but not critical munic.i.p.al service in a time when the budgets of safety and health agencies are being cut back.

Further, libraries in general have a very high labor-to-cost ratio in their budgets, and labor costs are still increasing, notwithstanding automation investments. It is difficult for libraries to obtain capital, startup, or seed funding for innovative activities, and those technology-intensive initiatives that offer the potential of decreased labor costs can provoke the opposition of library staff.

However, libraries have achieved some considerable successes in the past two decades by improving both their service and their credibility within their organizations--and these positive changes have been accomplished mostly with judicious use of information technologies. The advances in computing and information technology have been well-chronicled: the continuing precipitous drop in computing costs, the growth of the Internet and private networks, and the explosive increase in publicly available information databases.

For example, OCLC has become one of the largest computer network organizations in the world by creating a cooperative cataloging network of more than 6,000 libraries worldwide. On-line public access catalogs now serve millions of users on more than 50,000 dedicated terminals in the United States alone. The University of California MELVYL on-line catalog system has now expanded into an index database reference service and supports more than six million searches a year. And, libraries have become the largest group of customers of CD-ROM publishing technology; more than 30,000 optical media publications such as those offered by InfoTrac and Silver Platter are subscribed to by U.S. libraries.

This march of technology continues and in the next decade will result in further innovations that are extremely difficult to predict. What is clear is that libraries can now go beyond automation of their order files and catalogs to automation of their collections themselves--and it is possible to circ.u.mvent the fiscal limitations that appear to obtain today.

This Electronic Library Strategic Plan recommends a paradigm shift in library service, and demonstrates the steps necessary to provide improved library services with limited capacities and operating investments.

SESSION IV-A

Anne KENNEY

The Cornell/Xerox Joint Study in Digital Preservation resulted in the recording of 1,000 brittle books as 600-dpi digital images and the production, on demand, of high-quality and archivally sound paper replacements. The project, which was supported by the Commission on Preservation and Access, also investigated some of the issues surrounding scanning, storing, retrieving, and providing access to digital images in a network environment.

Anne Kenney will focus on some of the issues surrounding direct scanning as identified in the Cornell Xerox Project. Among those to be discussed are: image versus text capture; indexing and access; image-capture capabilities; a comparison to photocopy and microfilm; production and cost a.n.a.lysis; storage formats, protocols, and standards; and the use of this scanning technology for preservation purposes.

The 600-dpi digital images produced in the Cornell Xerox Project proved highly acceptable for creating paper replacements of deteriorating originals. The 1,000 scanned volumes provided an array of image-capture challenges that are common to nineteenth-century printing techniques and embrittled material, and that defy the use of text-conversion processes.

These challenges include diminished contrast between text and background, fragile and deteriorated pages, uneven printing, elaborate type faces, faint and bold text adjacency, handwritten text and annotations, nonRoman languages, and a proliferation of ill.u.s.trated material embedded in text.

The latter category included high-frequency and low-frequency halftones, continuous tone photographs, intricate mathematical drawings, maps, etchings, reverse-polarity drawings, and engravings.

The Xerox prototype scanning system provided a number of important features for capturing this diverse material. Technicians used multiple threshold settings, filters, line art and halftone definitions, autosegmentation, windowing, and software-editing programs to optimize image capture. At the same time, this project focused on production.

The goal was to make scanning as affordable and acceptable as photocopying and microfilming for preservation reformatting. A time-and-cost study conducted during the last three months of this project confirmed the economic viability of digital scanning, and these findings will be discussed here.

From the outset, the Cornell Xerox Project was predicated on the use of nonproprietary standards and the use of common protocols when standards did not exist. Digital files were created as TIFF images which were compressed prior to storage using Group 4 CCITT compression. The Xerox software is MS DOS based and utilizes off-the shelf programs such as Microsoft Windows and w.a.n.g Image Wizard. The digital library is designed to be hardware-independent and to provide interchangeability with other inst.i.tutions through network connections. Access to the digital files themselves is two-tiered: Bibliographic records for the computer files are created in RLIN and Cornell's local system and access into the actual digital images comprising a book is provided through a doc.u.ment control structure and a networked image file-server, both of which will be described.

The presentation will conclude with a discussion of some of the issues surrounding the use of this technology as a preservation tool (storage, refreshing, backup).

Pamela ANDRE and Judith ZIDAR

The National Agricultural Library (NAL) has had extensive experience with raster scanning of printed materials. Since 1987, the Library has partic.i.p.ated in the National Agricultural Text Digitizing Project (NATDP) a cooperative effort between NAL and forty-five land grant university libraries. An overview of the project will be presented, giving its history and NAL's strategy for the future.

An in-depth discussion of NATDP will follow, including a description of the scanning process, from the gathering of the printed materials to the archiving of the electronic pages. The type of equipment required for a stand-alone scanning workstation and the importance of file management software will be discussed. Issues concerning the images themselves will be addressed briefly, such as image format; black and white versus color; gray scale versus dithering; and resolution.

Also described will be a study currently in progress by NAL to evaluate the usefulness of converting microfilm to electronic images in order to improve access. With the cooperation of Tuskegee University, NAL has selected three reels of microfilm from a collection of sixty-seven reels containing the papers, letters, and drawings of George Washington Carver.

The three reels were converted into 3,500 electronic images using a specialized microfilm scanner. The selection, filming, and indexing of this material will be discussed.

Donald WATERS

Project Open Book, the Yale University Library's effort to convert 10, 000 books from microfilm to digital imagery, is currently in an advanced state of planning and organization. The Yale Library has selected a major vendor to serve as a partner in the project and as systems integrator. In its proposal, the successful vendor helped isolate areas of risk and uncertainty as well as key issues to be addressed during the life of the project. The Yale Library is now poised to decide what material it will convert to digital image form and to seek funding, initially for the first phase and then for the entire project.

The proposal that Yale accepted for the implementation of Project Open Book will provide at the end of three phases a conversion subsystem, browsing stations distributed on the campus network within the Yale Library, a subsystem for storing 10,000 books at 200 and 600 dots per inch, and network access to the image printers. Pricing for the system implementation a.s.sumes the existence of Yale's campus ethernet network and its high-speed image printers, and includes other requisite hardware and software, as well as system integration services. Proposed operating costs include hardware and software maintenance, but do not include estimates for the facilities management of the storage devices and image servers.

Yale selected its vendor partner in a formal process, partly funded by the Commission for Preservation and Access. Following a request for proposal, the Yale Library selected two vendors as finalists to work with Yale staff to generate a detailed a.n.a.lysis of requirements for Project Open Book. Each vendor used the results of the requirements a.n.a.lysis to generate and submit a formal proposal for the entire project. This compet.i.tive process not only enabled the Yale Library to select its primary vendor partner but also revealed much about the state of the imaging industry, about the varying, corporate commitments to the markets for imaging technology, and about the varying organizational dynamics through which major companies are responding to and seeking to develop these markets.

Project Open Book is focused specifically on the conversion of images from microfilm to digital form. The technology for scanning microfilm is readily available but is changing rapidly. In its project requirements, the Yale Library emphasized features of the technology that affect the technical quality of digital image production and the costs of creating and storing the image library: What levels of digital resolution can be achieved by scanning microfilm? How does variation in the quality of microfilm, particularly in film produced to preservation standards, affect the quality of the digital images? What technologies can an operator effectively and economically apply when scanning film to separate two-up images and to control for and correct image imperfections? How can quality control best be integrated into digitizing work flow that includes doc.u.ment indexing and storage?

The actual and expected uses of digital images--storage, browsing, printing, and OCR--help determine the standards for measuring their quality. Browsing is especially important, but the facilities available for readers to browse image doc.u.ments is perhaps the weakest aspect of imaging technology and most in need of development. As it defined its requirements, the Yale Library concentrated on some fundamental aspects of usability for image doc.u.ments: Does the system have sufficient flexibility to handle the full range of doc.u.ment types, including monographs, multi-part and multivolume sets, and serials, as well as ma.n.u.script collections? What conventions are necessary to identify a doc.u.ment uniquely for storage and retrieval? Where is the database of record for storing bibliographic information about the image doc.u.ment?

How are basic internal structures of doc.u.ments, such as pagination, made accessible to the reader? How are the image doc.u.ments physically presented on the screen to the reader?

The Yale Library designed Project Open Book on the a.s.sumption that microfilm is more than adequate as a medium for preserving the content of deteriorated library materials. As planning in the project has advanced, it is increasingly clear that the challenge of digital image technology and the key to the success of efforts like Project Open Book is to provide a means of both preserving and improving access to those deteriorated materials.

SESSION IV-B

George THOMA

In the use of electronic imaging for doc.u.ment preservation, there are several issues to consider, such as: ensuring adequate image quality, maintaining substantial conversion rates (through-put), providing unique identification for automated access and retrieval, and accommodating bound volumes and fragile material.

To maintain high image quality, image processing functions are required to correct the deficiencies in the scanned image. Some commercially available systems include these functions, while some do not. The scanned raw image must be processed to correct contrast deficiencies-- both poor overall contrast resulting from light print and/or dark background, and variable contrast resulting from stains and bleed-through. Furthermore, the scan density must be adequate to allow legibility of print and sufficient fidelity in the pseudo-halftoned gray material. Borders or page-edge effects must be removed for both compactibility and aesthetics. Page skew must be corrected for aesthetic reasons and to enable accurate character recognition if desired.

Compound images consisting of both two-toned text and gray-scale ill.u.s.trations must be processed appropriately to retain the quality of each.

SESSION IV-C

Jean BARONAS

Standards publications being developed by scientists, engineers, and business managers in a.s.sociation for Information and Image Management (AIIM) standards committees can be applied to electronic image management (EIM) processes including: doc.u.ment (image) transfer, retrieval and evaluation; optical disk and doc.u.ment scanning; and doc.u.ment design and conversion. When combined with EIM system planning and operations, standards can a.s.sist in generating image databases that are interchangeable among a variety of systems. The applications of different approaches for image-tagging, indexing, compression, and transfer often cause uncertainty concerning EIM system compatibility, calibration, performance, and upward compatibility, until standard implementation parameters are established. The AIIM standards that are being developed for these applications can be used to decrease the uncertainty, successfully integrate imaging processes, and promote "open systems." AIIM is an accredited American National Standards Inst.i.tute (ANSI) standards developer with more than twenty committees comprised of 300 volunteers representing users, vendors, and manufacturers. The standards publications that are developed in these committees have national acceptance and provide the basis for international harmonization in the development of new International Organization for Standardization (ISO) standards.

This presentation describes the development of AIIM's EIM standards and a new effort at AIIM, a database on standards projects in a wide framework of imaging industries including capture, recording, processing, duplication, distribution, display, evaluation, and preservation. The AIIM Imagery Database will cover imaging standards being developed by many organizations in many different countries. It will contain standards publications' dates, origins, related national and international projects, status, key words, and abstracts. The ANSI Image Technology Standards Board requested that such a database be established, as did the ISO/International Electrotechnical Commission Joint Task Force on Imagery. AIIM will take on the leadership role for the database and coordinate its development with several standards developers.

Patricia BATTIN

Characteristics of standards for digital imagery:

* Nature of digital technology implies continuing volatility.

* Precipitous standard-setting not possible and probably not desirable.

* Standards are a complex issue involving the medium, the hardware, the software, and the technical capacity for reproductive fidelity and clarity.

* The prognosis for reliable archival standards (as defined by librarians) in the foreseeable future is poor.

Significant potential and attractiveness of digital technology as a preservation medium and access mechanism.

Productive use of digital imagery for preservation requires a reconceptualizing of preservation principles in a volatile, standardless world.

Concept of managing continuing access in the digital environment rather than focusing on the permanence of the medium and long-term archival standards developed for the a.n.a.log world.

Transition period: How long and what to do?

* Redefine "archival."

* Remove the burden of "archival copy" from paper artifacts.

* Use digital technology for storage, develop management strategies for refreshing medium, hardware and software.