It’s a topic that has come up countless times in discussions of the Semantic Web (e.g.), and it came up recently on #code4lib: should all URIs be dereferenceable, or is it worthwhile to use non-HTTP URI schemes or non-resolving HTTP URIs?
The consensus from Semantic Web developers seems to be that URIs need not be dereferenceable, which has a certain amount of sense to it. It you give me the URI “http://jonathan.brinley.name/”, what would you put at the location “http://jonathan.brinley.name/”? If it’s a description of me, that description also has the URI “http://jonathan.brinley.name/”, giving us two resources with the same URI. With this data now in our system, we can make absurd statements like:
<http://jonathan.brinley.name/> <#describes> <http://jonathan.brinley.name/> .
This is all very ambiguous, since it could be saying:
- I’m describing myself
- I’m describing the document at “http://jonathan.brinley.name/”
- The document at “http://jonathan.brinley.name/” is describing me
- The document at “http://jonathan.brinley.name/” is describing itself
Thus the GIGO principle rears its ugly head. If you give two separate resources the same URI (which is supposed to be a globally unique identifier, remember), then you should expect ambiguity to follow. If you want to identify something uniquely, and that something is not on the web, you should give it a distinct URI from something that is on the web.
So, that answered, we turn to the second half of the problem: is it worthwhile to use non-HTTP URI schemes or non-resolving HTTP URIs?
The recent discussion started with a mention of “info” URIs. These can be used to uniquely identify resources, but have the (potential) drawback of not being dereferenceable. As established above, non-dereferenceability is not inherently bad. If one simply wants to identify something uniquely, the “info” scheme will work, as will several other schemes.
But there is a certain utility in dereferenceability. As edsu asked: “if you were processing an xml file that included a particular namespace wouldn’t it be nice to get a document that describes that namespace without resorting to google?” This is a place where the HTTP scheme can still be useful, even if the resource itself isn’t available on-line. Nothing says a server has to respond to an HTTP Get request with either a 200 “OK” or a 404 “Not Found”. A 303 “See Other” is a perfectly reasonable response to a request for a particular resource, when all that can be provided is a description of that resource. The server can then point to the URI where this description does reside, which will be distinct from the URI for the resource it describes.