How can we trust the rev=”canonical” URL? Who’s burden it is to prove that they’re correct URLs. What to do with misconfigured rev=”canonical” targets?
1. Hit original URL and parse HTML
2. Get the new URL and check if it has 301 redirect
3. (optional) in case 301 redirect is not there or is maybe other type of 3xx, does it go and check for original URL.
What to do in case of rev=”canonical” is the same URL that was just parsed, just like ArsTechnica does now. Do we say fine, lets use that long URL or we then decided on 3rd party URL shortner? (Marko points out that they’re using correct rel= and not rev).
What do you do when you can’t resolve the domain or something goes wrong in our oh-so-stable interwebs? Does HTML need to be valid or we just use regular expression to find the rev=”canonical” part?
Second question is, do we really expect services to accept this extra burden?
This means that suddenly an operation that once took a single call to bit.ly, now takes at least a few magnitudes more CPU and network resources as pages need to be accessed, parsed and checked for validity. While this might be possible for smaller services, I highly doubt Twitter wants to implement this any time soon.
There might be a cheat Twitter and other services could use. If we’re so afraid that we’re lose the links, it seems that they should be kept in a database under the control of the service.
This doesn’t fully solve the problem of long term URL maintainance, but at least it’s under the control of the same provider who stores the original context (e.g. twitts), enabling them to give you nice exports and faster expansion together with one less (perceived) liability.
Related articles by Zemanta
- john rocker: Are URL Shorteners A Necessary Evil, Or Just Evil? (via TechCrunch) (techcrunch.com)
- 5 Reasons Why URL Shorteners Are Useful (mashable.com)
- DiggBar Keeps All Digg Homepage Traffic On Digg (techcrunch.com)