Wednesday, May 31, 2006

Your Google public Calendar is NOT public!

I use Google Calendar in day life and also maintain GUADEC 2006 iCal file and keep it synced with the schedule of The GNOME Conference. Stop! don't try to add that iCal link to your Google Calendar account. It doesn't work and you will get an error message.

Here are what happens:

  1. Google's robots crawl the web and index all calendar files (i.e. iCal), and of course they check the robots.txt file to make sure they are allowed to do this.

  2. Unfortunately, Google Calendar AJAX application checks robots.txt. May be it uses the indexed file at Google's server and so it's a side effect of the previous fact.

  3. Unfortunately again Google's robots.txt file doesn't allow crawlers to index your public calendars.

Our iCal link is just an HTTP forward to the address of the file on Google's server. And yes, because of facts 2 and 3, you cannot add our iCal URL to your Google Calendar account! Not Google only, you cannot use it on any other web application that respect to robots.txt files. And of course you won't have your public calendar indexed in any search engine; but Google! It reminds me on how Microsoft tried to make its website somehow Netscape couldn't show it properly.

By the way, if you want to add GUADEC 2006 calendar to your Google Calendar account, just search for "GUADEC 2006" there, and add it directly. And send me a note if you need write access to it.

Update: Problems:

  1. An organizer application (feed reader, calendar, etc) SHOULD NOT use robots.txt files, as it's NOT a crawler, it's just a user agent which do exactly what a user tells it (like a browser).

  2. Google MUST allow other web crawlers to index public calendars, and NOT make a monopoly .


  1. How about just removing the robots.txt?

    It explicitly tells computers not to read any files in that directory, and you can't fault them for being conservative about respecting your wishes.

  2. Note that our iCal file IS on Google's web server and I can NOT remove robots.txt on server!

    Maybe a greasemonkey user script can make the AJAX application to not check the robots.txt, but it's not the way

  3. It might not be crawlable, but you can certainly add any Google Calendar public calendar directly if you know the URL -- what exactly isn't working