Library of Congress Workshop on Etexts - Part 10
Library

Part 10

BARONAS defined the program's scope. AIIM deals with: 1) the terminology of standards and of the technology it uses; 2) methods of measurement for the systems, as well as quality; 3) methodologies for users to evaluate and measure quality; 4) the features of apparatus used to manage and edit images; and 5) the procedures used to manage images.

BARONAS noted that three types of doc.u.ments are produced in the AIIM standards program: the first two, accredited by the American National Standards Inst.i.tute (ANSI), are standards and standard recommended practices. Recommended practices differ from standards in that they contain more tutorial information. A technical report is not an ANSI standard. Because AIIM's policies and procedures for developing standards are approved by ANSI, its standards are labeled ANSI/AIIM, followed by the number and t.i.tle of the standard.

BARONAS then ill.u.s.trated the domain of AIIM's standardization work. For example, AIIM is the administrator of the U.S. Technical Advisory Group (TAG) to the International Standards Organization's (ISO) technical committee, TC l7l Micrographics and Optical Memories for Doc.u.ment and Image Recording, Storage, and Use. AIIM officially works through ANSI in the international standardization process.

BARONAS described AIIM's structure, including its board of directors, its standards board of twelve individuals active in the image-management industry, its strategic planning and legal admissibility task forces, and its National Standards Council, which is comprised of the members of a number of organizations who vote on every AIIM standard before it is published. BARONAS pointed out that AIIM's liaisons deal with numerous other standards developers, including the optical disk community, office and publishing systems, image-codes-and-character set committees, and the National Information Standards Organization (NISO).

BARONAS ill.u.s.trated the procedures of TC l7l, which covers all aspects of image management. When AIIM's national program has conceptualized a new project, it is usually submitted to the international level, so that the member countries of TC l7l can simultaneously work on the development of the standard or the technical report. BARONAS also ill.u.s.trated a cla.s.sic microfilm standard, MS23, which deals with numerous imaging concepts that apply to electronic imaging. Originally developed in the l970s, revised in the l980s, and revised again in l991, this standard is scheduled for another revision. MS23 is an active standard whereby users may propose new density ranges and new methods of evaluating film images in the standard's revision.

BARONAS detailed several electronic image-management standards, for instance, ANSI/AIIM MS44, a quality-control guideline for scanning 8.5"

by 11" black-and-white office doc.u.ments. This standard is used with the IEEE fax image--a continuous tone photographic image with gray scales, text, and several continuous tone pictures--and AIIM test target number 2, a representative doc.u.ment used in office doc.u.ment management.

BARONAS next outlined the four categories of EIM standardization in which AIIM standards are being developed: transfer and retrieval, evaluation, optical disc and doc.u.ment scanning applications, and design and conversion of doc.u.ments. She detailed several of the main projects of each: 1) in the category of image transfer and retrieval, a bi-level image transfer format, ANSI/AIIM MS53, which is a proposed standard that describes a file header for image transfer between unlike systems when the images are compressed using G3 and G4 compression; 2) the category of image evaluation, which includes the AIIM-proposed TR26 tutorial on image resolution (this technical report will treat the differences and similarities between cla.s.sical or photographic and electronic imaging); 3) design and conversion, which includes a proposed technical report called "Forms Design Optimization for EIM" (this report considers how general-purpose business forms can be best designed so that scanning is optimized; reprographic characteristics such as type, rules, background, tint, and color will likewise be treated in the technical report); 4) disk and doc.u.ment scanning applications includes a project a) on planning platters and disk management, b) on generating an application profile for EIM when images are stored and distributed on CD-ROM, and c) on evaluating SCSI2, and how a common command set can be generated for SCSI2 so that doc.u.ment scanners are more easily integrated. (ANSI/AIIM MS53 will also apply to compressed images.)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ BATTIN * The implications of standards for preservation * A major obstacle to successful cooperation * A hindrance to access in the digital environment * Standards a double-edged sword for those concerned with the preservation of the human record * Near-term prognosis for reliable archival standards * Preservation concerns for electronic media * Need for reconceptualizing our preservation principles * Standards in the real world and the politics of reproduction * Need to redefine the concept of archival and to begin to think in terms of life cycles * Cooperation and the La Guardia Eight * Concerns generated by discussions on the problems of preserving text and image * General principles to be adopted in a world without standards *

Patricia BATTIN, president, the Commission on Preservation and Access (CPA), addressed the implications of standards for preservation. She listed several areas where the library profession and the a.n.a.log world of the printed book had made enormous contributions over the past hundred years--for example, in bibliographic formats, binding standards, and, most important, in determining what const.i.tutes longevity or archival quality.

Although standards have lightened the preservation burden through the development of national and international collaborative programs, nevertheless, a pervasive mistrust of other people's standards remains a major obstacle to successful cooperation, BATTIN said.

The zeal to achieve perfection, regardless of the cost, has hindered rather than facilitated access in some instances, and in the digital environment, where no real standards exist, has brought an ironically just reward.

BATTIN argued that standards are a double-edged sword for those concerned with the preservation of the human record, that is, the provision of access to recorded knowledge in a mult.i.tude of media as far into the future as possible. Standards are essential to facilitate interconnectivity and access, but, BATTIN said, as LYNCH pointed out yesterday, if set too soon they can hinder creativity, expansion of capability, and the broadening of access. The characteristics of standards for digital imagery differ radically from those for a.n.a.log imagery. And the nature of digital technology implies continuing volatility and change. To reiterate, precipitous standard-setting can inhibit creativity, but delayed standard-setting results in chaos.

Since in BATTIN'S opinion the near-term prognosis for reliable archival standards, as defined by librarians in the a.n.a.log world, is poor, two alternatives remain: standing pat with the old technology, or reconceptualizing.

Preservation concerns for electronic media fall into two general domains.

One is the continuing a.s.surance of access to knowledge originally generated, stored, disseminated, and used in electronic form. This domain contains several subdivisions, including 1) the closed, proprietary systems discussed the previous day, bundled information such as electronic journals and government agency records, and electronically produced or captured raw data; and 2) the application of digital technologies to the reformatting of materials originally published on a deteriorating a.n.a.log medium such as acid paper or videotape.

The preservation of electronic media requires a reconceptualizing of our preservation principles during a volatile, standardless transition which may last far longer than any of us envision today. BATTIN urged the necessity of shifting focus from a.s.sessing, measuring, and setting standards for the permanence of the medium to the concept of managing continuing access to information stored on a variety of media and requiring a variety of ever-changing hardware and software for access--a fundamental shift for the library profession.

BATTIN offered a primer on how to move forward with reasonable confidence in a world without standards. Her comments fell roughly into two sections: 1) standards in the real world and 2) the politics of reproduction.

In regard to real-world standards, BATTIN argued the need to redefine the concept of archive and to begin to think in terms of life cycles. In the past, the naive a.s.sumption that paper would last forever produced a cavalier att.i.tude toward life cycles. The transient nature of the electronic media has compelled people to recognize and accept upfront the concept of life cycles in place of permanency.

Digital standards have to be developed and set in a cooperative context to ensure efficient exchange of information. Moreover, during this transition period, greater flexibility concerning how concepts such as backup copies and archival copies in the CXP are defined is necessary, or the opportunity to move forward will be lost.

In terms of cooperation, particularly in the university setting, BATTIN also argued the need to avoid going off in a hundred different directions. The CPA has catalyzed a small group of universities called the La Guardia Eight--because La Guardia Airport is where meetings take place--Harvard, Yale, Cornell, Princeton, Penn State, Tennessee, Stanford, and USC, to develop a digital preservation consortium to look at all these issues and develop de facto standards as we move along, instead of waiting for something that is officially blessed. Continuing to apply a.n.a.log values and definitions of standards to the digital environment, BATTIN said, will effectively lead to forfeiture of the benefits of digital technology to research and scholarship.

Under the second rubric, the politics of reproduction, BATTIN reiterated an oft-made argument concerning the electronic library, namely, that it is more difficult to transform than to create, and nowhere is that belief expressed more dramatically than in the conversion of brittle books to new media. Preserving information published in electronic media involves making sure the information remains accessible and that digital information is not lost through reproduction. In the a.n.a.log world of photocopies and microfilm, the issue of fidelity to the original becomes paramount, as do issues of "Whose fidelity?" and "Whose original?"

BATTIN elaborated these arguments with a few examples from a recent study conducted by the CPA on the problems of preserving text and image.

Discussions with scholars, librarians, and curators in a variety of disciplines dependent on text and image generated a variety of concerns, for example: 1) Copy what is, not what the technology is capable of.

This is very important for the history of ideas. Scholars wish to know what the author saw and worked from. And make available at the workstation the opportunity to erase all the defects and enhance the presentation. 2) The fidelity of reproduction--what is good enough, what can we afford, and the difference it makes--issues of subjective versus objective resolution. 3) The differences between primary and secondary users. Restricting the definition of primary user to the one in whose discipline the material has been published runs one headlong into the reality that these printed books have had a host of other users from a host of other disciplines, who not only were looking for very different things, but who also shared values very different from those of the primary user. 4) The relationship of the standard of reproduction to new capabilities of scholarship--the browsing standard versus an archival standard. How good must the archival standard be? Can a distinction be drawn between potential users in setting standards for reproduction?

Archival storage, use copies, browsing copies--ought an attempt to set standards even be made? 5) Finally, costs. How much are we prepared to pay to capture absolute fidelity? What are the trade-offs between vastly enhanced access, degrees of fidelity, and costs?

These standards, BATTIN concluded, serve to complicate further the reproduction process, and add to the long list of technical standards that are necessary to ensure widespread access. Ways to articulate and a.n.a.lyze the costs that are attached to the different levels of standards must be found.

Given the chaos concerning standards, which promises to linger for the foreseeable future, BATTIN urged adoption of the following general principles:

* Strive to understand the changing information requirements of scholarly disciplines as more and more technology is integrated into the process of research and scholarly communication in order to meet future scholarly needs, not to build for the past. Capture deteriorating information at the highest affordable resolution, even though the dissemination and display technologies will lag.

* Develop cooperative mechanisms to foster agreement on protocols for doc.u.ment structure and other interchange mechanisms necessary for widespread dissemination and use before official standards are set.

* Accept that, in a transition period, de facto standards will have to be developed.

* Capture information in a way that keeps all options open and provides for total convertibility: OCR, scanning of microfilm, producing microfilm from scanned doc.u.ments, etc.

* Work closely with the generators of information and the builders of networks and databases to ensure that continuing accessibility is a primary concern from the beginning.

* Piggyback on standards under development for the broad market, and avoid library-specific standards; work with the vendors, in order to take advantage of that which is being standardized for the rest of the world.

* Concentrate efforts on managing permanence in the digital world, rather than perfecting the longevity of a particular medium.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DISCUSSION * Additional comments on TIFF *

During the brief discussion period that followed BATTIN's presentation, BARONAS explained that TIFF was not developed in collaboration with or under the auspices of AIIM. TIFF is a company product, not a standard, is owned by two corporations, and is always changing. BARONAS also observed that ANSI/AIIM MS53, a bi-level image file transfer format that allows unlike systems to exchange images, is compatible with TIFF as well as with DEC's architecture and IBM's MODCA/IOCA.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ HOOTON * Several questions to be considered in discussing text conversion *

HOOTON introduced the final topic, text conversion, by noting that it is becoming an increasingly important part of the imaging business. Many people now realize that it enhances their system to be able to have more and more character data as part of their imaging system. Re the issue of OCR versus rekeying, HOOTON posed several questions: How does one get text into computer-readable form? Does one use automated processes?

Does one attempt to eliminate the use of operators where possible?

Standards for accuracy, he said, are extremely important: it makes a major difference in cost and time whether one sets as a standard 98.5 percent acceptance or 99.5 percent. He mentioned outsourcing as a possibility for converting text. Finally, what one does with the image to prepare it for the recognition process is also important, he said, because such preparation changes how recognition is viewed, as well as facilitates recognition itself.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ LESK * Roles of partic.i.p.ants in CORE * Data flow * The scanning process *

The image interface * Results of experiments involving the use of electronic resources and traditional paper copies * Testing the issue of serendipity * Conclusions *

Michael LESK, executive director, Computer Science Research, Bell Communications Research, Inc. (Bellcore), discussed the Chemical Online Retrieval Experiment (CORE), a cooperative project involving Cornell University, OCLC, Bellcore, and the American Chemical Society (ACS).

LESK spoke on 1) how the scanning was performed, including the unusual feature of page segmentation, and 2) the use made of the text and the image in experiments.

Working with the chemistry journals (because ACS has been saving its typesetting tapes since the mid-1970s and thus has a significant back-run of the most important chemistry journals in the United States), CORE is attempting to create an automated chemical library. Approximately a quarter of the pages by square inch are made up of images of quasi-pictorial material; dealing with the graphic components of the pages is extremely important. LESK described the roles of partic.i.p.ants in CORE: 1) ACS provides copyright permission, journals on paper, journals on microfilm, and some of the definitions of the files; 2) at Bellcore, LESK chiefly performs the data preparation, while Dennis Egan performs experiments on the users of chemical abstracts, and supplies the indexing and numerous magnetic tapes; 3) Cornell provides the site of the experiment; 4) OCLC develops retrieval software and other user interfaces.

Various manufacturers and publishers have furnished other help.

Concerning data flow, Bellcore receives microfilm and paper from ACS; the microfilm is scanned by outside vendors, while the paper is scanned inhouse on an Improvision scanner, twenty pages per minute at 300 dpi, which provides sufficient quality for all practical uses. LESK would prefer to have more gray level, because one of the ACS journals prints on some colored pages, which creates a problem.