History of Robots.txt

robot

Photo Credit http://www.flickr.com/photos/tamaleaver/

 

If you’ve never heard of robots.txt files before, they’re simply a  text file placed on a web server containing a set of rules to control how web spiders crawl a site.

Developed in 1994 by Martijn Koster, robots.txt files can now be found on most websites, however they’re rarely viewed by humans.

However did you know there’s a lighter side to these simple files?

For example, Malcolm Coles recently made a discovery when analysing the Daily Mail’s robots.txt file. Hidden inside was this secret message:

# August 12th, MailOnline are looking for a talented SEO Manager so if you found this you’re the kind of techie we need!

According to Malcolm as of August 25th, his post had received 5,500 views, so I suspect the position at the Daily Mail was quickly filled!

 

.

Google  Zombies

zombie

Photo Credit http://www.flickr.com/photos/kwl/

In October 2008, Google slipped a hidden message into the rules of their robots.txt file for Halloween:

User-agent : zombies

Disallow : /brains

The change was covered by Google’s Web Spam expert Matt Cutts in his personal blog  

http://www.mattcutts.com/blog/google-protects-itself-from-zombies/

 

 

Robot Reforms?

whitehouse

Photo Credit http://www.flickr.com/photos/dcjohn/

Now moving on to politics, did you know that when the Obama administration moved into the White House, they pruned the www.whitehouse.gov robots file down from nearly 2400 lines down to just:

User-agent: *

Disallow: /includes

Today the file now contains just the following:

User-agent: *

Crawl-delay: 10

Sitemap: http://www.whitehouse.gov/feed/media/video-audio

As a comparison, you can view an archive of the original Whitehouse Robots.txt

 

 

Asimov’s 3 Laws of Robotics

robots2

Photo Credit http://www.flickr.com/photos/gladius/

Quite a common tactic with robots files is to include Asimov’s 3 Laws of robotics.   Both Yelp.com and Last.FM have included Asimov’s 3 rules for Robots in their files:

Yelp.com

#

# 1. A robot may not injure a human being or, through inaction, allow a

# human being to come to harm.

#

# 2. A robot must obey orders given it by human beings except where such

# orders would conflict with the First Law.

#

# 3. A robot must protect its own existence as long as such protection

# does not conflict with the First or Second Law.

 

Last.fm

Disallow: /harming/humans

Disallow: /ignoring/human/orders

Disallow: /harm/to/self

 

Hidden Whisky

whisky

Photo Credit http://www.flickr.com/photos/bignavijp/

Another find by Malcolm Coles!   This time on the Whyte and Mackay website.    W&M decided to run an online promotion in October 2010, giving away 250 bottles of 30 year old whisky worth £150.

The special bottles were hidden in stores across the UK, however the first person to read the robots.txt and send an email also won a bottle (don’t bother it’s already been won!).

You can see the robots.txt with the competition at Malcolm’s blog:

http://www.malcolmcoles.co.uk/blog/whisky-hidden-in-robots-txt-file/

 

 

Bender and Gort

bender

Photo credit http://www.flickr.com/photos/josek/

Finally,we come to the robots.txt file for the site www.reddit.com.    Hidden in the text are instructions to two popular robots, Futurama’s own Bender (shown above), and Gort from the 1951 film, The Day the Earth Stood Still.

(photo credit http://www.flickr.com/photos/narcosislabs/)

gortreddit

Let us know in the comments if you find any other funny or odd robots.txt files!