Here are what happens:
- Google's robots crawl the web and index all calendar files (i.e. iCal), and of course they check the robots.txt file to make sure they are allowed to do this.
- Unfortunately, Google Calendar AJAX application checks robots.txt. May be it uses the indexed file at Google's server and so it's a side effect of the previous fact.
- Unfortunately again Google's robots.txt file doesn't allow crawlers to index your public calendars.
Our iCal link is just an HTTP forward to the address of the file on Google's server. And yes, because of facts 2 and 3, you cannot add our iCal URL to your Google Calendar account! Not Google only, you cannot use it on any other web application that respect to robots.txt files. And of course you won't have your public calendar indexed in any search engine; but Google! It reminds me on how Microsoft tried to make its website somehow Netscape couldn't show it properly.
By the way, if you want to add GUADEC 2006 calendar to your Google Calendar account, just search for "GUADEC 2006" there, and add it directly. And send me a note if you need write access to it.
Update: Problems:
- An organizer application (feed reader, calendar, etc) SHOULD NOT use robots.txt files, as it's NOT a crawler, it's just a user agent which do exactly what a user tells it (like a browser).
- Google MUST allow other web crawlers to index public calendars, and NOT make a monopoly .
How about just removing the robots.txt?
ReplyDeleteIt explicitly tells computers not to read any files in that directory, and you can't fault them for being conservative about respecting your wishes.
Note that our iCal file IS on Google's web server and I can NOT remove robots.txt on google.com server!
ReplyDeleteMaybe a greasemonkey user script can make the AJAX application to not check the robots.txt, but it's not the way
It might not be crawlable, but you can certainly add any Google Calendar public calendar directly if you know the URL -- what exactly isn't working
ReplyDelete