A Short History of EBooks - Part 7
Library

Part 7

= Online catalogs

# OPACs

The internet boosted library catalogs through cybers.p.a.ce. OPACs (OPAC: Online Public Access Catalog) were more attractive and user-friendly than the older print and computer catalogs. Some catalogs began to give instant online access to the full text of books and journals, something that would become a major trend ten years later.

The first step was UNIMARC, as a common bibliographic format for library catalogs. The IFLA (International Federation of Library a.s.sociations) published the first edition of "UNIMARC: Universal MARC Format" in 1977, followed by a second edition in 1980 and a "UNIMARC Handbook" in 1983.

UNIMARC (Universal Machine Readable Cataloging) was set up as a solution to the 20 existing national MARC (Machine Readable Cataloging) formats. 20 formats meant lack of compatibility and extensive editing when bibliographic records were exchanged.

With UNIMARC, catalogers would be able to process records created in any MARC format. Records in one MARC format would first be converted into UNIMARC, and then be converted into another MARC format. UNIMARC would also be promoted as a format on its own.

In May 1997, the British Library launched OPAC 97 to provide free online access to the catalogs of its main collections in London and Boston Spa. It also launched Blaise, an online bibliographic information service (with a small fee), and Inside, a catalog of articles from 20,000 journals and 16,000 conferences. As explained on the website at the time: "The Library's services are based on its outstanding collections, developed over 250 years, of over one hundred and fifty million items representing every age of written civilisation, every written language and every aspect of human thought. At present individual collections have their own separate catalogues, often built up around specific subject areas. Many of the Library's plans for its collections, and for meeting its users'

needs, require the development of a single catalogue database.

This is being pursued in the Library's Corporate Bibliographic Programme which seeks to address this issue." The "single catalogue database" was fully operational a few years later.

Another leading effort was the one of the Library of Congress with its Experimental Search System (ESS). The ESS was "one of the Library of Congress' first efforts to make selected cataloging and digital library resources available over the World Wide Web by means of a single, point-and-click interface.

The interface consists of several search query pages (Basic, Advanced, Number, and a Browse screen) and several search results pages (an item list of brief displays and an item full display), together with brief help files which link directly from significant words on those pages. By exploiting the powerful synergies of hyperlinking and a relevancy-ranked search engine (InQuery from Sovereign Hill Software), we hope the ESS will provide a new and more intuitive way of searching the traditional OPAC (Online Public Access Catalog)." (excerpt from the website in 1998)

Another interesting - and totally different - initiative was the creation of the Internet Public Library (IPL) by the School of Information and Library Studies at the University of Michigan. The IPL went live in March 1995 as the first U.S.

digital public library to serve the internet community, and to catalog websites and webpages. The librarians' task was to choose the best doc.u.ments available on the web, and process them as library doc.u.ments for them to be easily accessed from the IPL website, that acted as a portal. The IPL sections were: Reference, Exhibits, Magazines and Serials, Newspapers, Online Texts, and Web Searching. There were also Teen and Youth sections. All items were carefully selected, catalogued and described by the IPL staff. As an experimental library, IPL also listed the best internet projects that were run by librarians, in the section Especially for Librarians. Since then, students from the IPL Consortium, a consortium of colleges and universities with programs in information science, have worked on maintaining and developing the IPL as a public library for the web.

# Union catalogs

In 1999, the two main union catalogs were WorldCat, run by OCLC (Online Computer Library Center), and RLIN (Research Library Information Network), run by the Research Libraries Group (RLG).

What exactly is a union catalog? The idea behind a union catalog is to earn time by avoiding the cataloging of the same doc.u.ment by many catalogers worldwide. When catalogers of a member library catalog a new doc.u.ment, they first search the union catalog. If the record is available, they import it into their own library catalog and add the local data. If the record is not available, they create it in their own library catalog and export it into the union catalog. The new record is immediately available to all catalogers of member libraries.

Depending on their status, experience and quality of cataloging, member libraries can either import records only, or import and export records.

OCLC (Online Computer Library Center) was created in 1971 as a non-profit organization dedicated to furthering access to the world's information while reducing information costs. The OCLC Online Union Catalog - renamed WorldCat much later - began as the union catalog of the university libraries in the State of Ohio. Over the years, OCLC became a national and then worldwide library cooperative, and WorldCat the largest library catalog in the world. In early 1998, WorldCat had 38 million records in 400 languages - with transliteration for non-Roman languages) - and an annual increase of 2 million records. In 1998, 27,000 libraries in 65 countries were using OCLC services (paid subscription) to manage their collections and provide online reference services.

WorldCat has only accepted one bibliographic record per doc.u.ment, unlike RLIN (Research Library Information Network), launched by the Research Libraries Group (RLG) in 1980. RLIN accepted several records per doc.u.ment, with 88 million records in early 1998.

Members of RLG were mainly research and specialized libraries.

RLIN was later renamed the RLG Union Catalog. Its free web version, RedLightGreen, was launched in fall 2003 as a beta version, and in spring 2004 as a full version. This was a major move, not only for library members, but for all internet users, who could also access it for free.

In 2005, WorldCat had 61 million bibliographic records in 400 languages, from 9,000 member libraries in 112 countries. In 2006, 73 million bibliographic records were linking to one billion doc.u.ments available in these libraries.

In August 2006, WorldCat began to migrate to the web through the beta version of its new website worldcat.org. Member libraries now provided free access to their catalogs and electronic resources: books, audiobooks, abstracts and full- text articles, photos, music CDs and videos. RedLightGreen ended its service in November 2006, and RLG joined OCLC.

2000: INFORMATION IS AVAILABLE IN MANY LANGUAGES

= [Overview]

2000 was a turning point for a multilingual internet, both for its content and its users. In summer 2000, non-English-speaking users reached 50%. This percentage went on to increase steadily: 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 - with 34.9% non-English- speaking Europeans and 29.4% Asians - and 64.2% in March 2004 - with 37.9% non-English-speaking Europeans and 33% Asians (source: Global Reach). The internet is also a good tool for minority languages, as stated by Caoimhin o Donnaile, who teaches computing at the Inst.i.tute Sabhal Mor Ostaig, located on the Island of Skye, in Scotland. Caoimhin also maintains the college website, which is the main site worldwide with information on Scottish Gaelic, with a bilingual (English, Gaelic) list of European minority languages. He wrote in May 2001: "Students do everything by computer, use Gaelic spell- checking, a Gaelic online terminology database. There are more hits on our website. There is more use of sound. Gaelic radio (both Scottish and Irish) is now available continuously worldwide via the internet. A major project has been the translation of the Opera web-browser into Gaelic - the first software of this size available in Gaelic."

= "Language nations"

At first, the internet was nearly 100% English. Born in the United States, it spread in North America before taking over the whole planet. Then people from all continents began connecting to the internet and posting webpages in their own languages. In the 1990s, the percentage of English decreased from nearly 100% to 85% (reached in 1997 or 1998, depending on the sources).

In 1997, Babel - a joint initiative from Alis Technologies (language translation services) and the Internet Society - ran the first major study relating to distribution of languages on the web. The results were published in June 1997 on a webpage named Web Languages. .h.i.t Parade. The main languages were English with 82.3%, German with 4.0%, j.a.panese with 1.6%, French with 1.5%, Spanish with 1.1%, Swedish with 1.1%, and Italian with 1.0%.

In July 1998, according to Global Reach, a company specializing in international online marketing, the fastest growing groups of internet users were non-English-speaking: Spanish-speaking, 22.4%, j.a.panese-speaking, 12.3%; German-speaking, 14%; and French-speaking, 10% - with 56 million non-English-speaking users. More than 80% of all webpages were still in English, whereas only 6% of the world population spoke English as a native language (16% spoke Spanish).

Randy Hobler was a consultant in internet marketing for Globalink, a company specializing in language translation software and services. He wrote in September 1998: "85% of the content of the web in 1998 is in English and going down. This trend is driven not only by more websites and users in non- English-speaking countries, but by increasing localization of company and organization sites, and increasing use of machine translation to/from various languages to translate websites."

Randy also brought up the concept of "language nations": "Because the internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call 'Language Nations'... all those people on the internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the U.S., as well as odd places like Spanish-speaking Morocco."

Robert Ware created OneLook Dictionaries in April 1996, as a "fast finder" of words in hundreds of online dictionaries. He wrote about an experience he had in 1994, that showed the internet could promote both a common language and multilingualism: "In 1994, I was working for a college and trying to install a software package on a particular type of computer. I located a person who was working on the same problem and we began exchanging email. Suddenly, it hit me...

the software was written only 30 miles away but I was getting help from a person half way around the world. Distance and geography no longer mattered! OK, this is great! But what is it leading to? I am only able to communicate in English but, fortunately, the other person could use English as well as German which was his mother tongue. The internet has removed one barrier (distance) but with that comes the barrier of language. It seems that the internet is moving people in two quite different directions at the same time. The internet (initially based on English) is connecting people all around the world. This is further promoting a common language for people to use for communication. But it is also creating contact between people of different languages and creates a greater interest in multilingualism. A common language is great but in no way replaces this need. So the internet promotes both a common language *and* multilingualism. The good news is that it helps provide solutions. The increased interest and need is creating incentives for people around the world to create improved language courses and other a.s.sistance, and the internet is providing fast and inexpensive opportunities to make them available."

The internet could also be a tool to develop a "cultural ident.i.ty". During the Symposium on Multimedia Convergence organized by the International Labor Office (ILO) in January 1997, Shinji Matsumoto, general secretary of the Musicians'

Union of j.a.pan (MUJ), explained: "j.a.pan is quite receptive to foreign culture and foreign technology. (...) Foreign culture is pouring into j.a.pan and, in fact, the domestic market is being dominated by foreign products. Despite this, when it comes to preserving and further developing j.a.panese culture, there has been insufficient support from the government. (...) With the development of information networks, the earth is getting smaller and it is wonderful to be able to make cultural exchanges across vast distances and to deepen mutual understanding among people. We have to remember to respect national cultures and social systems."

As the internet quickly spread worldwide, more and more people in the U.S. realized that, although English may stay the main international language for exchanges of all kinds, not everyone in the world reads English and, even so, people prefer to read information in their own language. To reach as large an audience as possible, companies and organizations needed to offer bilingual, trilingual, even multilingual websites, while adapting their content to a given audience. Thus the need of both internationalization and localization, which became a major trend in the following years, not only in the U.S. but in many countries, where foreign companies set up bilingual websites - in their language and in English - to reach a wider audience, and get more clients.

Translation software available on the web was far from perfect, but was helpful, because instantaneous and free, unlike a high- quality professional translation. In December 1997, AltaVista, a leading search engine, was the first to launch such software with Babel Fish - also called AltaVista Translation -, which could translate webpages (up to three pages at the same time) from English into French, German, Italian, Portuguese or Spanish, and vice versa. The software was developed by Systran, a company specializing in machine translation. This initiative was followed by others, with free and/or paid versions on the web, developed by Alis Technologies, Globalink, Lernout & Hauspie, IBM (with the WebSphere Translation Server), Softissimo, Champollion, TMX or Trados.

Brian King, director of the WorldWide Language Inst.i.tute (WWLI), brought up the concept of "linguistic democracy" in September 1998: "Whereas 'mother-tongue education' was deemed a human right for every child in the world by a UNESCO report in the early '50s, 'mother-tongue surfing' may very well be the Information Age equivalent. If the internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't."

Jean-Pierre Cloutier was the editor of "Chroniques de Cyberie", a weekly French-language online report of internet news. He wrote in August 1999: "We pa.s.sed a milestone this summer. Now more than half the users of the internet live outside the United States. Next year, more than half of all users will be non English-speaking, compared with only 5% five years ago.

Isn't that great?"

The internet did pa.s.s this second milestone in summer 2000, with non-English-speaking users reaching 50%. As shown in the statistics of Global Reach, they were 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 (with 34.9% non-English-speaking Europeans and 29.4% Asians), and 64.2% in March 2004 (with 37.9% non-English-speaking Europeans and 33% Asians).

= From ASCII to Unicode

Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1968 by ANSI (American National Standards Inst.i.tute), with an update in 1977 and 1986. The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a-z, numbers, punctuation and basic symbols), i.e. the ones that are available on the English/American keyboard.

With the use of other European languages, extensions of ASCII (also called ISO-8859 or ISO-Latin) were created as sets of 256 characters to add accented characters as found in French, Spanish and German, for example ISO 8859-1 (ISO-Latin-1) for French.

Yoshi Mikami, who lives in Fujisawa, j.a.pan, launched the bilingual (j.a.panese, English) website "The Languages of the World by Computers and the Internet", also known as Logos Home Page or Kotoba Home Page, in late 1995. Yoshi was the co-author (with Kenji Sekine and n.o.butoshi Kohara) of "The Multilingual Web Guide" (j.a.panese edition), a print book published by O'Reilly j.a.pan in August 1997, and translated in 1998 into English, French and German.

Yoshi Mikami explained in December 1998: "My native tongue is j.a.panese. Because I had my graduate education in the U.S. and worked in the computer business, I became bilingual in j.a.panese and American English. I was always interested in languages and different cultures, so I learned some Russian, French and Chinese along the way. In late 1995, I created on the web The Languages of the World by Computers and the Internet and tried to summarize there the brief history, linguistic and phonetic features, writing system and computer processing aspects for each of the six major languages of the world, in English and j.a.panese. As I gained more experience, I invited my two a.s.sociates to help me write a book on viewing, understanding and creating multilingual web pages, which was published in August 1997 as 'The Multilingual Web Guide', in a j.a.panese edition, the world's first book on such a subject."

Yoshi added in the same email interview: "Thousands of years ago, in Egypt, China and elsewhere, people were more concerned about communicating their laws and thoughts not in just one language, but in several. In our modern world, most nation states have each adopted one language for their own use. I predict greater use of different languages and multilingual pages on the internet, not a simple gravitation to American English, and also more creative use of multilingual computer translation. 99% of the websites created in j.a.pan are written in j.a.panese."

Brian King, director of the WorldWide Language Inst.i.tute (WWLI), explained in September 1998: "A pull from non-English- speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development. This development has not been as fast as it could have been. The first step was for ASCII to become Extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet - mostly used by European languages. But only one language could be displayed on a page at a time. (...) The most recent development is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8-byte Extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer. So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the web in Chinese, j.a.panese, Korean, and numerous other languages that don't use the Western alphabet. As the internet spreads to parts of the world where English is rarely used - such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it.

For the majority of the users in China, their mother tongue will be the only choice."

Ten years later, in 2008, 50% of all the doc.u.ments available on the internet were encoded in Unicode, with the other 50% encoded in ASCII. ASCII is still very useful, especially the original 7-bit plain ASCII, because it can be read, written, copied and printed by any text editor or word processor, and it is the only format compatible with 99% of all hardware and software.

First published in January 1991, Unicode "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (excerpt from the website). This double-byte platform-independent encoding provides a basis for the processing, storage and interchange of text data in any language, and any modern software and information technology protocols. Unicode is maintained by the Unicode Consortium, and is a component of the W3C (World Wide Web Consortium) specifications.

= Language dictionaries