February 11, 2009
herbertv at lanl dot gov
This write-up is an impromptu response to Andy Powell`s Repository
Usability blog entry. It also touches on some issues that Andy raised in
another entry, Freedom,
Google-juice and institutional mandates. The purpose of the write-up is to
try and alleviate some of Andy`s pain regarding the status quo of scholarly
repositories: while the current situation may indeed not be perfect, a possible
solution may not be too hard to establish. The solution I describe uses the OAI-ORE specifications, and
quite some other techniques that have been introduced by several communities
over the past years. Hey, a technological mash-up, one could say. Use and reuse
what is there before inventing new stuff is the motto. I am afraid that the
solution may cause Andy some phantom pain, since it also leverages OAI-PMH. While
I agree that there are a few things
we didn`t get quite right with OAI-PMH, I don`t think it`s the cause of all
evil in the (repository) world, and I actually even think we can leverage the
existing deployed PMH repositories for a good cause.
Anyhow, I think the DSpace example of Andy`s blog entry
is a nice one to have a close look at, indeed. The four URIs that are a source
of frustration for Andy, are an indication to me that OAI-ORE Aggregations can
come to the rescue. As a matter of fact, the ORE Primer uses an arXiv example that is quite
similar to the DSpace one: lots of URIs flying around that somehow belong
together.
1. We start by modeling this
multi-resource DSpace Item as an ORE Aggregation. In the case of this DSpace
example, we can actually give that ORE Aggregation the existing URI http://hdl.handle.net/1842/1476.
This is the URI-A of ORE.
2. We then introduce a new
resource from which we are going to make a machine-readable description of the
ORE Aggregation available. In ORE lingo, that resource is named a Resource Map,
and its URI is known as URI-R. There is a choice of formats for the description
of an Aggregation, but lets just say that for this example we use RDF/XML as described in
the ORE Guidelines.
3. We leave the URI of the
jump-off page http://www.era.lib.ed.ac.uk/handle/1842/1476
the way it is. We will actually have use for it. This URI is sometimes referred
to as URI-S (S for splash page) in ORE lingo.
4. Now, we glue the URIs that
we have encountered thus far together for the benefit of Web navigation. We do
so by following the Cool URIs for the
Semantic Web guidelines, and HTTP 303 redirect from URI-A to the existing
URI-S of jump-off page for human consumption, and from URI-A to the new URI-R
of the ORE Resource Map for machine consumption. This HTTP redirect approach is
also described in
a section of ORE HTTP Guidelines; there are alternative approaches
described in the ORE HTTP
Guidelines too.
5. To further enhance chances of discovery, we point from
the jump-off page to the Resource Map. We can do so in two, not mutually
exclusive ways. First, as described in a section
of the ORE Discovery Guidelines, by adding LINK to the HTML of the jump-off
page, i.e. <link
rel="resourcemap" type="application/rdf+xml"
href="URI-R" >. Second, as described in another
section of the ORE Discovery Guidelines, by providing an HTTP LINK HEADER,
i.e. Link: <URI-R>;
type="application/rdf+xml"; rel="resourcemap".
6. Having been so busy with all
those URIs and discovery approaches to please both crawlers and browsers, we
almost forgot to actually aggregate resources into that ORE Aggregation. So,
for the DSpace example, the following resources can be considered to be part of
the Aggregation:
(a) The jump-off page with
URI-S = http://www.era.lib.ed.ac.uk/handle/1842/1476
(b) The PDF file with URI http://www.era.lib.ed.ac.uk/bitstream/1842/1476/1/Ariadne/fallacy_author_tidy.pdf
(c) One (or more) metadata
record(s) describing this DSpace Item. Turns out we have such metadata
descriptions available from the repositories` OAI-PMH interface. And, instead
of throwing that OAI-PMH interface away, we could as well consider leveraging
it. Anyhow, in the case of Edinburgh`s PMH interface, we have an oai_dc
resource available at http://www.era.lib.ed.ac.uk/dspace-oai/request?verb=GetRecord&identifier=oai:www.era.lib.ed.ac.uk:1842/1476&metadataPrefix=oai_dc.
We`ll make this resource part of the ORE Aggregation too, and let`s give that
long OAI-PMH URI the short hand URI-M for now.
7. The Resource Map at URI-R
will obviously describe which resources are part of the Aggregation (see 6,
above), e.g. URI-A ore:aggregates URI-S.
But there`s more information that can be conveyed. Some of this extra information
is addressed in the next bullets.
(8) Resource Map Extra 1: An
ore:similarTo relationship between URI-A of the Aggregation and the non-HTTP
URI variant for this URI, i.e. info:hdl/1842/1476:
http://hdl.handle.net/1842/1476 ore:similarTo info:hdl/1842/1476
My apologies to Andy for the
added pain caused by using an info URI
here. But I think it can serve a purpose in the realm of the perceived lowering
of Google Juice caused by multiple copies of the same thing spread across the
Web, as described in Freedom,
Google-juice and institutional mandates. Google Scholar doesn`t do a bad
job at merging all those copies, using metadata-based heuristics. But, how
about helping Google Scholar (and other applications) a bit more by providing
this extra identifier clue for all copies of a same thing? Personally, I think
this is quite relevant for dealing with multiple copies for a thing with a DOI.
Allows for graph-merging etc.
(9) Resource Map Extra 2: The
rdf:type of the jump-off page is info:eu-repo/semantics/humanStartPage (ouch,
again), see the
relevant section of the ORE Atom Guideline:
http://www.era.lib.ed.ac.uk/handle/1842/1476 rdf:type info:eu-repo/semantics/humanStartPage
(10) Resource Map Extra 3: The
rdf:type of the metadata resource is info:eu-repo/semantics/descriptiveMetadata
(ouch, again), see the
relevant section of the ORE Atom Guideline:
URI-M
rdf:type info:eu-repo/semantics/descriptiveMetadata
(11) Resource Map Extra 4: The
metadata format of the metadata resource is OAI DC, , see the
relevant section of the ORE Atom Guideline:
URI-M
dcterms:conformsTo http://www.openarchives.org/OAI/2.0/oai_dc/
(12) Resource Map Extra 5: There is a version of this DSpace
Item in Ariadne:
URI-A dcterms:hasVersion http://www.ariadne.ac.uk/issue46/rusbridge/
(13) Resource Map Extra 6: Express some metadata about the
Aggregation, such as authorship, publication time, type (journal article) etc.
URI-A dcterms:creator `Rusbridge,
Chris`
URI-A dcterms:created `2006-12-13T11:55:51Z`
URI-A rdf:type http://purl.org/eprint/type/JournalArticle
Etc.
(14) A question is raised by
(2): how is the Resource Map is going to be served? Interestingly enough, for
several existing repository solutions, it might very well be possible to
leverage the OAI-PMH repositories for this purpose: add another metadata format
(ORE RDF/XML) and serve the Resource map from the corresponding OAI-PMH
GetRecord URI.
(15) A question is raised by
(8)-(13): Can all that information be pulled together into a nice Resource Map
on the basis of the data/metadata that the repository has available about an
item? The answer is positive, I would think, in many cases. But, for example,
expressing the hasVersion relationship in (12) on the basis of multiple
dc:identifier entries will no doubt get tricky. Dirty solution to the problem:
make the Ariadne resource also part of the Aggregation and forget about the
version thing ;-)
(16) Both (14) and (6c) raise
another issue: The representations that are returned when dereferencing an
OAI-PMH-based URI-R and URI-M contain OAI-PMH protocol overhead, i.e.
responseDate, request etc. So, they are more than just e.g. DC metadata. A
possible solution to this problem using an overhead-stripping gateway is
described in a
section of the ORE Discovery guidelines. OCLC has such a gateway at http://purl.org/OAIUtil?getRecordURL=PMH-URL-here.
Another solution could be found, I think, in using OAI2LOD.
(17) Now, admittedly, those
OAI-PMH URIs are pretty long and quite ugly. But here is a Tiny URL for the
URI-M from (6c): http://tinyurl.com/bq8k2h.
And then this becomes the URI that references the DC metadata resource, without
protocol overhead: http://purl.org/OAIUtil?getRecordURL=http://tinyurl.com/bq8k2h
. Obviously one could generate a Tiny URL for that one too.
(18) For quite a while, I have
been looking around for a term from some vocabulary to express the relationship
between a resource, and another resource that has descriptive metadata about it.
If such a term would exist, it would be really nice to add another statement to
the Resource Map to express this relationship between URI-A and URI-M. Something
like: URI-M xyz:isDescriptiveMetadataOf URI-A.
Let me know if such a relationship
exists.
(19) And then, to top it all
off, it would be really nice if the DSpace jump-off page could actually provide
that URI-A in a YouTube-style copy/paste box. Since we want that ORE
Aggregation URI to be spread around, to be the referenced URI. Jee, even that
is mentioned in a
section of the ORE Discovery guidelines.