Repository Usability - Herbert`s Construction

 

February 11, 2009

Herbert Van de Sompel

herbertv at lanl dot gov

 

This write-up is an impromptu response to Andy Powell`s Repository Usability blog entry. It also touches on some issues that Andy raised in another entry, Freedom, Google-juice and institutional mandates. The purpose of the write-up is to try and alleviate some of Andy`s pain regarding the status quo of scholarly repositories: while the current situation may indeed not be perfect, a possible solution may not be too hard to establish. The solution I describe uses the OAI-ORE specifications, and quite some other techniques that have been introduced by several communities over the past years. Hey, a technological mash-up, one could say. Use and reuse what is there before inventing new stuff is the motto. I am afraid that the solution may cause Andy some phantom pain, since it also leverages OAI-PMH. While I agree that there are a few things we didn`t get quite right with OAI-PMH, I don`t think it`s the cause of all evil in the (repository) world, and I actually even think we can leverage the existing deployed PMH repositories for a good cause.

 

Anyhow, I think the DSpace example of Andy`s blog entry is a nice one to have a close look at, indeed. The four URIs that are a source of frustration for Andy, are an indication to me that OAI-ORE Aggregations can come to the rescue. As a matter of fact, the ORE Primer uses an arXiv example that is quite similar to the DSpace one: lots of URIs flying around that somehow belong together.

Here is an outline of a possible approach:

 

1. We start by modeling this multi-resource DSpace Item as an ORE Aggregation. In the case of this DSpace example, we can actually give that ORE Aggregation the existing URI http://hdl.handle.net/1842/1476. This is the URI-A of  ORE.

 

2. We then introduce a new resource from which we are going to make a machine-readable description of the ORE Aggregation available. In ORE lingo, that resource is named a Resource Map, and its URI is known as URI-R. There is a choice of formats for the description of an Aggregation, but lets just say that for this example we use RDF/XML as described in the ORE Guidelines.

 

3. We leave the URI of the jump-off page http://www.era.lib.ed.ac.uk/handle/1842/1476 the way it is. We will actually have use for it. This URI is sometimes referred to as URI-S (S for splash page) in ORE lingo.

 

4. Now, we glue the URIs that we have encountered thus far together for the benefit of Web navigation. We do so by following the Cool URIs for the Semantic Web guidelines, and HTTP 303 redirect from URI-A to the existing URI-S of jump-off page for human consumption, and from URI-A to the new URI-R of the ORE Resource Map for machine consumption. This HTTP redirect approach is also described in a section of ORE HTTP Guidelines; there are alternative approaches described in the ORE HTTP Guidelines too.

 

5. To further enhance chances of discovery, we point from the jump-off page to the Resource Map. We can do so in two, not mutually exclusive ways. First, as described in a section of the ORE Discovery Guidelines, by adding LINK to the HTML of the jump-off page, i.e. <link rel="resourcemap" type="application/rdf+xml" href="URI-R" >. Second, as described in another section of the ORE Discovery Guidelines, by providing an HTTP LINK HEADER, i.e. Link: <URI-R>; type="application/rdf+xml"; rel="resourcemap".

 

6. Having been so busy with all those URIs and discovery approaches to please both crawlers and browsers, we almost forgot to actually aggregate resources into that ORE Aggregation. So, for the DSpace example, the following resources can be considered to be part of the Aggregation:

 

(a) The jump-off page with URI-S = http://www.era.lib.ed.ac.uk/handle/1842/1476

(b) The PDF file with URI http://www.era.lib.ed.ac.uk/bitstream/1842/1476/1/Ariadne/fallacy_author_tidy.pdf

(c) One (or more) metadata record(s) describing this DSpace Item. Turns out we have such metadata descriptions available from the repositories` OAI-PMH interface. And, instead of throwing that OAI-PMH interface away, we could as well consider leveraging it. Anyhow, in the case of Edinburgh`s PMH interface, we have an oai_dc resource available at http://www.era.lib.ed.ac.uk/dspace-oai/request?verb=GetRecord&identifier=oai:www.era.lib.ed.ac.uk:1842/1476&metadataPrefix=oai_dc. We`ll make this resource part of the ORE Aggregation too, and let`s give that long OAI-PMH URI the short hand URI-M for now.

 

7. The Resource Map at URI-R will obviously describe which resources are part of the Aggregation (see 6, above), e.g. URI-A ore:aggregates URI-S. But there`s more information that can be conveyed. Some of this extra information is addressed in the next bullets.

 

(8) Resource Map Extra 1: An ore:similarTo relationship between URI-A of the Aggregation and the non-HTTP URI variant for this URI, i.e. info:hdl/1842/1476:

http://hdl.handle.net/1842/1476 ore:similarTo info:hdl/1842/1476

 

My apologies to Andy for the added pain caused by using an info URI here. But I think it can serve a purpose in the realm of the perceived lowering of Google Juice caused by multiple copies of the same thing spread across the Web, as described in Freedom, Google-juice and institutional mandates. Google Scholar doesn`t do a bad job at merging all those copies, using metadata-based heuristics. But, how about helping Google Scholar (and other applications) a bit more by providing this extra identifier clue for all copies of a same thing? Personally, I think this is quite relevant for dealing with multiple copies for a thing with a DOI. Allows for graph-merging etc.

 

(9) Resource Map Extra 2: The rdf:type of the jump-off page is info:eu-repo/semantics/humanStartPage (ouch, again), see the relevant section of the ORE Atom Guideline:

http://www.era.lib.ed.ac.uk/handle/1842/1476 rdf:type info:eu-repo/semantics/humanStartPage

 

(10) Resource Map Extra 3: The rdf:type of the metadata resource is info:eu-repo/semantics/descriptiveMetadata (ouch, again), see the relevant section of the ORE Atom Guideline:

URI-M rdf:type info:eu-repo/semantics/descriptiveMetadata

 

(11) Resource Map Extra 4: The metadata format of the metadata resource is OAI DC, , see the relevant section of the ORE Atom Guideline:

URI-M dcterms:conformsTo http://www.openarchives.org/OAI/2.0/oai_dc/

 

(12) Resource Map Extra 5: There is a version of this DSpace Item in Ariadne:

URI-A dcterms:hasVersion http://www.ariadne.ac.uk/issue46/rusbridge/

 

(13) Resource Map Extra 6: Express some metadata about the Aggregation, such as authorship, publication time, type (journal article) etc.

URI-A dcterms:creator `Rusbridge, Chris`

URI-A dcterms:created `2006-12-13T11:55:51Z`

URI-A rdf:type http://purl.org/eprint/type/JournalArticle

Etc.

And Some Related Discussion:

 

(14) A question is raised by (2): how is the Resource Map is going to be served? Interestingly enough, for several existing repository solutions, it might very well be possible to leverage the OAI-PMH repositories for this purpose: add another metadata format (ORE RDF/XML) and serve the Resource map from the corresponding OAI-PMH GetRecord URI.

 

(15) A question is raised by (8)-(13): Can all that information be pulled together into a nice Resource Map on the basis of the data/metadata that the repository has available about an item? The answer is positive, I would think, in many cases. But, for example, expressing the hasVersion relationship in (12) on the basis of multiple dc:identifier entries will no doubt get tricky. Dirty solution to the problem: make the Ariadne resource also part of the Aggregation and forget about the version thing ;-)

 

(16) Both (14) and (6c) raise another issue: The representations that are returned when dereferencing an OAI-PMH-based URI-R and URI-M contain OAI-PMH protocol overhead, i.e. responseDate, request etc. So, they are more than just e.g. DC metadata. A possible solution to this problem using an overhead-stripping gateway is described in a section of the ORE Discovery guidelines. OCLC has such a gateway at http://purl.org/OAIUtil?getRecordURL=PMH-URL-here. Another solution could be found, I think, in using OAI2LOD.

 

(17) Now, admittedly, those OAI-PMH URIs are pretty long and quite ugly. But here is a Tiny URL for the URI-M from (6c): http://tinyurl.com/bq8k2h. And then this becomes the URI that references the DC metadata resource, without protocol overhead: http://purl.org/OAIUtil?getRecordURL=http://tinyurl.com/bq8k2h . Obviously one could generate a Tiny URL for that one too.

 

(18) For quite a while, I have been looking around for a term from some vocabulary to express the relationship between a resource, and another resource that has descriptive metadata about it. If such a term would exist, it would be really nice to add another statement to the Resource Map to express this relationship between URI-A and URI-M. Something like: URI-M xyz:isDescriptiveMetadataOf URI-A.  Let me know if such a relationship exists.

 

(19) And then, to top it all off, it would be really nice if the DSpace jump-off page could actually provide that URI-A in a YouTube-style copy/paste box. Since we want that ORE Aggregation URI to be spread around, to be the referenced URI. Jee, even that is mentioned in a section of the ORE Discovery guidelines.