Library of Congress Workshop on Etexts - Part 19
Library

Part 19

* Create acid-free paper copies for transition period backup until we develop reliable procedures for ensuring continuing access to digital files.

SESSION IV-D

Stuart WEIBEL The Role of SGML Markup in the CORE Project (6)

The emergence of high-speed telecommunications networks as a basic feature of the scholarly workplace is driving the demand for electronic doc.u.ment delivery. Three distinct categories of electronic publishing/republishing are necessary to support access demands in this emerging environment:

1.) Conversion of paper or microfilm archives to electronic format 2.) Conversion of electronic files to formats tailored to electronic retrieval and display 3.) Primary electronic publishing (materials for which the electronic version is the primary format)

OCLC has experimental or product development activities in each of these areas. Among the challenges that lie ahead is the integration of these three types of information stores in coherent distributed systems.

The CORE (Chemistry Online Retrieval Experiment) Project is a model for the conversion of large text and graphics collections for which electronic typesetting files are available (category 2). The American Chemical Society has made available computer typography files dating from 1980 for its twenty journals. This collection of some 250 journal-years is being converted to an electronic format that will be accessible through several end-user applications.

The use of Standard Generalized Markup Language (SGML) offers the means to capture the structural richness of the original articles in a way that will support a variety of retrieval, navigation, and display options necessary to navigate effectively in very large text databases.

An SGML doc.u.ment consists of text that is marked up with descriptive tags that specify the function of a given element within the doc.u.ment. As a formal language construct, an SGML doc.u.ment can be pa.r.s.ed against a doc.u.ment-type definition (DTD) that unambiguously defines what elements are allowed and where in the doc.u.ment they can (or must) occur. This formalized map of article structure allows the user interface design to be uncoupled from the underlying database system, an important step toward interoperability. Demonstration of this separability is a part of the CORE project, wherein user interface designs born of very different philosophies will access the same database.

NOTES: (6) The CORE project is a collaboration among Cornell University's Mann Library, Bell Communications Research (Bellcore), the American Chemical Society (ACS), the Chemical Abstracts Service (CAS), and OCLC.

Michael LESK The CORE Electronic Chemistry Library

A major on-line file of chemical journal literature complete with graphics is being developed to test the usability of fully electronic access to doc.u.ments, as a joint project of Cornell University, the American Chemical Society, the Chemical Abstracts Service, OCLC, and Bellcore (with additional support from Sun Microsystems, Springer-Verlag, DigitaI Equipment Corporation, Sony Corporation of America, and Apple Computers). Our file contains the American Chemical Society's on-line journals, supplemented with the graphics from the paper publication. The indexing of the articles from Chemical Abstracts Doc.u.ments is available in both image and text format, and several different interfaces can be used. Our goals are (1) to a.s.sess the effectiveness and acceptability of electronic access to primary journals as compared with paper, and (2) to identify the most desirable functions of the user interface to an electronic system of journals, including in particular a comparison of page-image display with ASCII display interfaces. Early experiments with chemistry students on a variety of tasks suggest that searching tasks are completed much faster with any electronic system than with paper, but that for reading all versions of the articles are roughly equivalent.

Pamela ANDRE and Judith ZIDAR

Text conversion is far more expensive and time-consuming than image capture alone. NAL's experience with optical character recognition (OCR) will be related and compared with the experience of having text rekeyed.

What factors affect OCR accuracy? How accurate does full text have to be in order to be useful? How do different users react to imperfect text?

These are questions that will be explored. For many, a service bureau may be a better solution than performing the work inhouse; this will also be discussed.

SESSION VI

Marybeth PETERS

Copyright law protects creative works. Protection granted by the law to authors and disseminators of works includes the right to do or authorize the following: reproduce the work, prepare derivative works, distribute the work to the public, and publicly perform or display the work. In addition, copyright owners of sound recordings and computer programs have the right to control rental of their works. These rights are not unlimited; there are a number of exceptions and limitations.

An electronic environment places strains on the copyright system.

Copyright owners want to control uses of their work and be paid for any use; the public wants quick and easy access at little or no cost. The marketplace is working in this area. Contracts, guidelines on electronic use, and collective licensing are in use and being refined.

Issues concerning the ability to change works without detection are more difficult to deal with. Questions concerning the integrity of the work and the status of the changed version under the copyright law are to be addressed. These are public policy issues which require informed dialogue.

Appendix III: DIRECTORY OF PARTIc.i.p.aNTS

PRESENTERS:

Pamela Q.J. Andre a.s.sociate Director, Automation National Agricultural Library 10301 Baltimore Boulevard Beltsville, MD 20705-2351 Phone: (301) 504-6813 Fax: (301) 504-7473 E-mail: INTERNET: [email protected]

Jean Baronas, Senior Manager Department of Standards and Technology a.s.sociation for Information and Image Management (AIIM) 1100 Wayne Avenue, Suite 1100 Silver Spring, MD 20910 Phone: (301) 587-8202 Fax: (301) 587-2711

Patricia Battin, President The Commission on Preservation and Access 1400 16th Street, N.W.

Suite 740 Washington, DC 20036-2217 Phone: (202) 939-3400 Fax: (202) 939-3407 E-mail: [email protected]

Howard Besser Centre Canadien d'Architecture (Canadian Center for Architecture) 1920, rue Baile Montreal, Quebec H3H 2S6 CANADA Phone: (514) 939-7001 Fax: (514) 939-7020 E-mail: [email protected]

Edwin B. Brownrigg, Executive Director Memex Research Inst.i.tute 422 Bonita Avenue Roseville, CA 95678 Phone: (916) 784-2298 Fax: (916) 786-7559 E-mail: BITNET: [email protected]

Eric M. Calaluca, Vice President Chadwyck-Healey, Inc.

1101 King Street Alexandria, VA 223l4 Phone: (800) 752-05l5 Fax: (703) 683-7589

James Daly 4015 Deepwood Road Baltimore, MD 21218-1404 Phone: (410) 235-0763

Ricky Erway, a.s.sociate Coordinator American Memory Library of Congress Phone: (202) 707-6233 Fax: (202) 707-3764

Carl Fleischhauer, Coordinator American Memory Library of Congress Phone: (202) 707-6233 Fax: (202) 707-3764

Joanne Freeman 2000 Jefferson Park Avenue, No. 7 Charlottesville, VA 22903

Prosser Gifford Director for Scholarly Programs Library of Congress Phone: (202) 707-1517 Fax: (202) 707-9898 E-mail: [email protected]

Jacqueline Hess, Director National Demonstration Laboratory for Interactive Information Technologies Library of Congress Phone: (202) 707-4157 Fax: (202) 707-2829

Susan Hockey, Director Center for Electronic Texts in the Humanities (CETH) Alexander Library Rutgers University 169 College Avenue New Brunswick, NJ 08903 Phone: (908) 932-1384 Fax: (908) 932-1386 E-mail: [email protected]

William L. Hooton, Vice President Business & Technical Development Imaging & Information Systems Group I-NET 6430 Rockledge Drive, Suite 400 Bethesda, MD 208l7 Phone: (301) 564-6750 Fax: (513) 564-6867

Anne R. Kenney, a.s.sociate Director Department of Preservation and Conservation 701 Olin Library Cornell University Ithaca, NY 14853 Phone: (607) 255-6875 Fax: (607) 255-9346 E-mail: [email protected]

Ronald L. La.r.s.en a.s.sociate Director for Information Technology University of Maryland at College Park Room B0224, McKeldin Library College Park, MD 20742-7011 Phone: (301) 405-9194 Fax: (301) 314-9865 E-mail: [email protected]

Maria L. Lebron, Managing Editor The Online Journal of Current Clinical Trials l333 H Street, N.W.

Washington, DC 20005 Phone: (202) 326-6735 Fax: (202) 842-2868 E-mail: [email protected]

Michael Lesk, Executive Director Computer Science Research Bell Communications Research, Inc.

Rm 2A-385 445 South Street Morristown, NJ 07960-l9l0 Phone: (201) 829-4070 Fax: (201) 829-5981 E-mail: [email protected] (Internet) or bellcore!lesk (uucp)

Clifford A. Lynch Director, Library Automation University of California, Office of the President 300 Lakeside Drive, 8th Floor Oakland, CA 94612-3350 Phone: (510) 987-0522 Fax: (510) 839-3573 E-mail: [email protected]

Avra Michelson National Archives and Records Administration NSZ Rm. 14N 7th & Pennsylvania, N.W.

Washington, D.C. 20408 Phone: (202) 501-5544 Fax: (202) 501-5533 E-mail: [email protected]

Elli Mylonas, Managing Editor Perseus Project Department of the Cla.s.sics Harvard University 319 Boylston Hall Cambridge, MA 02138 Phone: (617) 495-9025, (617) 495-0456 (direct) Fax: (617) 496-8886 E-mail: [email protected] or

David Woodley Packard Packard Humanities Inst.i.tute 300 Second Street, Suite 201 Los Altos, CA 94002 Phone: (415) 948-0150 (PHI) Fax: (415) 948-5793