Current Cites (Digital Library SunSITE)

Volume 12, no. 3, March 2001

Edited by Roy Tennant

The Library, University of California, Berkeley, 94720
ISSN: 1060-2356 - http://sunsite.berkeley.edu/CurrentCites/2001/cc01.12.3.html

Contributors: Charles W. Bailey, Jr., Jim Ronningen, Roy Tennant

Editor's Note: It is my great pleasure to announce that Charles W. Bailey, Jr., Margaret Gross, Shirl Kennedy, Leo Robert Klein, and Eric Lease Morgan have joined the Current Cites team. All of them combine significant writing experience with awareness of current information technology issues and challenges. Welcome, all! Also, starting with this issue, only those who have cites in any particular issue will be listed as a contributor. A complete list of team members will continue to be available (http://sunsite.berkeley.edu/CurrentCites/team.html).


Adams, Katherine C. "The Web as a Database: New Extraction Technologies and Content Management" Online 25(2) (March/April 2001) p. 27-32 (http://www.onlineinc.com/onlinemag/OL2001/adams3_01.html). - Introduction of basic concepts is this article's strength. Working from the core concept that information extraction is fact retrieval, as opposed to traditional information retrieval's function as document gatherer, Adams describes software which can discover, sift and organize content (including data generated "on the fly") to a fine granularity. She focuses on two primary methods: IE software which can analyze complexities and ambiguities in language at the sentence level, and wrapper induction software which relies upon shallower pattern matching techniques. The role that XML plays is briefly explained. Occasionally, her description of the linguistic tasks performed by IE software is unclear; e.g. it's easy to misconstrue the sentence "For example, the phrase 'my mother's brother' and 'my brother' express the same relationship, but the way in which the information is expressed differs" to mean that both phrases point to the same person, when her actual intention is to show the syntactic similarity in the two phrases. But on the whole the article is a nice overview of the subject, and citations are provided for readers who want to go further into problems of technical implementation. - JR

CAMiLEON Project Papers: Holdsworth, David and Paul Wheatley, "Emulation, Preservation, and Abstraction"; Holdsworth, David "Emulation: C-ing Ahead"; Wheatley, Paul "Migration: A CAMiLEON Discussion Paper". - The CAMiLEON Project (http://www.si.umich.edu/CAMILEON/) is a joint project of the universities of Michigan (USA) and Leeds (UK) to "evaluate emulation as a digital preservation strategy by developing emulation tools, cost-benefit analysis and user evaluation." Although the project is ostensibly focused on emulation as a tool for preserving digital content, the project also studies migration as a strategy for preservation to compare against it. These papers — particularly the overviews of emulation and migration — can be useful to get a sense of these preservation strategies. - RT

Chudnov, Daniel. "An Interview with Paul Everitt and Ken Manheimer of Digital Creations, Publishers of Zope" oss4lib (March 2001) (http://oss4lib.org/readings/interview-everitt-manheimer-2001-03.php). - I don't usually cite things that perhaps only a few hundred of our readers will understand, but I'm making an exception for this. The reason is that there should be more than a few hundred librarians who understand this, or we're in big trouble. What they're talking about is nothing less than organizing information. Librarians talk about cataloging and classification, they talk about metadata, Content Management Framework, and Wiki. Wiki? Yes, Wiki. Like I said, there's maybe a few hundred of you out there who've even heard of it. The point is this. People like Paul Everitt and Ken Manheimer are out there creating new information spaces — spaces that people will want to discover and use like other types of more traditional information spaces (e.g., books, journals). Therefore, the more "intelligence" (cataloging, metadata) that can be built into them from scratch, the better off we'll be. Imagine what the world would be like, for example, if the Web had been created with some easy method of trapping keywords in META tags. Got the picture? A little bit of time spent now can save us an untold amount of trouble later. Responding to Chudnov's question "What more can librarians do to contribute our experience and insight to the broader software community regarding metadata issues?" Everitt said "Uhh, prevent knuckleheads like me from repeating historical mistakes. It's doubtful that a disruptive technology for metadata will come out of the ranks of librarians. However, if librarians keep an open mind and don't fall prey to sacrificing the larger victory by clinging to a narrow agenda, then they can spot a winner and help guide it to adulthood." Sounds like good advice to me. - RT

Chudnov, Daniel, Cynthia Crooker, and Kimberly Parker. "jake: Overview and Status Report." Serials Review 26 (4) (2000): 12-17. - Ever try to find out what licensed databases index a journal or include its full text? It's not a pretty picture, and you could spend a lot of time tediously digging around in vendor Web sites to unearth this information. Or, you could use jake. Back in 1999, the Cushing/Whitney Medical Library at Yale University developed the Jointly Administered Knowledge Environment (jake) to solve this problem. After much cooperative effort by librarians at Yale and elsewhere, you can now go to the jake test server at the Simon Fraser University Library, type a journal name in the Search Title box (or use another search key), hit submit, and, presto, there's a list of all the databases that include the journal, who the database providers are, and what the dates of coverage for citations or full text are. If you are feeling ambitious, you can start your own jake server. The software and data are available under the terms of the GNU General Public License. Read the article to get a good overview of this laudable project, and visit the project Web site for further information. - CB

Crawford, Diane, ed. "The Next 1,000 Years" Communications of the ACM Special Issue 44(3) (March 2001). - A taste for the speculative is a common trait among those of us who are interested in what computers can do. "Speculative fiction," a fancy euphemism for science fiction, is perennially popular with us, and I recommend this special issue as pleasure reading because it brings the same "gee whiz" factor that good science fiction does when the author has a firm grasp of how technologies might bring about possible futures. These are the prognostications of experts in many fields related to computing, from virtual reality to economics, from artificial intelligence to politics (now that's a link that's been pointed out before). Over 60 fearless authors contributed short essays, which have been divided into four groups: Tools and Technologies, Red Flags, Software Solutions, and Education. Of course, the mere existence of this issue in print and digital form raises the question: what types of archives will hold it through the centuries to come, when our descendants may look back on it and laugh? - JR

Luce, Richard E. "E-prints Intersect the Digital Library: Inside the Los Alamos arXiv" Issues in Science & Technology Librarianship 19 (Winter 2001) (http://www.library.ucsb.edu/istl/01-winter/article3.html). - The main strength of this piece is its overview of the arXiv e-print repository, its history and impact on the physics community. Where it is weakest is with identifying impacts on libraries. But if you pay enough attention to what he says in this piece, and read a little bit between the lines, you should be able to figure it out yourself. As the library at the Los Alamos National Laboratory is doing, we (libraries) need to be involved with collecting, managing, providing access to, and preserving this important type of literature. And with a model for success like arXiv and the recently developed ePrints (which is available for free at http://www.eprints.org/), there are precious few reasons why not. - RT

OCLC/RLG Working Group on Preservation Metadata. Preservation Metadata for Digital Objects: A Review of the State of the Art A White Paper by the OCLC/RLG Working Group on Preservation Metadata, 2001 (http://www.oclc.org/digitalpreservation/presmeta_wp.pdf). - The title describes the goal of this white paper, and it does it quite well. Digital preservation is a global issue, and the membership and findings of this group reflect this global nature. Exemplars of metadata for the purpose of preserving digital objects are reviewed, including the Open Archival Information System (OAIS) reference model, and metadata element sets from the Research Libraries Group (RLG), the National Library of Australia, CURL Exemplars in Digital Archives (CEDARS), the Networked European Deposit Library (NEDLIB), and Harvard University. The white paper ends by identifying points of convergence between these metadata element sets, and enumerating issues requiring further discussion. - RT

PRISM: Publishing Requirements for Industry Standard Metadata; Public 'Last Call' for Version 1.0 PRISM Working Group and IDEAlliance (March 5, 2001) (http://www.prismstandard.org/only/lastcall.asp; also available as a zipped PDF at http://www.prismstandard.org/only/lastcalldraft.zip). - In their own words, the PRISM specification "defines an XML metadata vocabulary for syndicating, aggregating, post-processing and multi-purposing magazine, news, catalog, book, and mainstream journal content. PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements." Written by a working group with representatives from organizations like Sotheby's, Time, Condé Nast Publications, Adobe Systems, and Getty Images, among others, this draft specification is aimed at making syndicated content easier to provide and process. The specification relies heavily on RDF and Dublin Core, as well as its own syntax for describing controlled vocabularies. As with any specification or standard, the proof is not in the document but in the usage. Only time will tell if this is will become another TCP/IP (ubiquitous) or an ISO OSI (hardly ever used and long forgotten). - RT

Proceedings of the Building the Virtual Reference Desk in a 24/7 World Conference January 12, 2001, Library of Congress, Washington, DC (http://www.loc.gov/rr/digiref/webcasts/). - It's fitting that a conference looking at ways in which librarians can offer reference services over the Internet has itself pushed the envelope in delivering conference content over the Internet. This site is very successful in using video streaming technology to not only deliver video of the speakers, but also coordinate it with their slides. The result is amazing, as virtually the only thing missing from the experience is the third dimension. And the content itself also doesn't disappoint, although it isn't as easy to skim in this format. Speakers include Jay Jordan, President and CEO of OCLC, Diane Nester Kresh, the manager of the Collaborative Digital Reference Service project of the Library of Congress, Paul Constantine of Cornell University, David Lankes, a long-time practitioner of online (mostly email-based) reference, and Susan McGlamery, perhaps the single most experienced person with web-based reference systems in front-line use. Unlike many conferences, here there isn't a weak speaker in the bunch. Anyone interested in digital reference who wasn't there in person should spend a few hours at this site. And if you were there, you can relive the experience. - RT

Proceedings of the Museums and the Web 2001 Conference Seattle, Washington (March 14-17, 2001), sponsored by Archives & Museum Informatics (http://www.archimuse.com/mw2001/speakers/). - Anyone interested in how museums are using the web will almost surely find one or more presentations of interest here. Topics range from the bleeding edge ("Enhancing Museum Visitor Access Through Robotic Avatars Connected to the Web") to the more mundane. Unfortunately, papers are listed by each individual speaker, thereby leading to the same paper appearing multiple times in the list when there are several co-speakers. Also, since exhibitors are apparently considered "speakers", they appear in the list as well, thereby increasing the number of listings that do not lead to an online presentation. But if you can overlook these faults, you may just find an interesting or useful presentation. - RT

Van de Sompel, Herbert and Oren Beit-Arie. "Open Linking in the Scholarly Information Environment Using the OpenURL Framework" D-Lib Magazine (http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html). - One of the toughest issues for libraries in providing robust and effective access to web-based resources has been the problem of linking. The Web, as useful as it is, is nonetheless quite primitive when it comes to links. Except for quite limited situations, links are static (point to one hard-coded location, whether it is the right one for your audience or not) and singular (unable to point to multiple destinations). This is where the OpenURL Framework comes in, and what products like SFX and CrossRef are trying to solve. This article serves as a good overview of the issues and technology "players", and should take the place of Van de Sompel's previous three-part D-Lib Magazine series on SFX for all except the most technically voracious people or masochists — your call if either of those categrories apply to you. - RT


Current Cites 12(3) (March 2001) ISSN: 1060-2356
Copyright © 2001 by the Regents of the University of California All rights reserved.

Copying is permitted for noncommercial use by computerized bulletin board/conference systems, individual scholars, and libraries. Libraries are authorized to add the journal to their collections at no cost. This message must appear on copied material. All commercial use requires permission from the editor. All product names are trademarks or registered trade marks of their respective holders. Mention of a product in this publication does not necessarily imply endorsement of the product. To subscribe to the Current Cites distribution list, send the message "sub cites [your name]" to listserv@library.berkeley.edu, replacing "[your name]" with your name. To unsubscribe, send the message "unsub cites" to the same address.

Copyright © 2001 The Regents of the University of California. All rights reserved.
Document maintained at http://sunsite.berkeley.edu/CurrentCites/2001/cc01.12.3.html by Roy Tennant.
Last update March 29, 2001. SunSITE Manager: manager@sunsite.berkeley.edu