Yahoo-Overture does not respect robots.txt

Today I received the following message in my mailbox:

an improper scan has caused a ban on your site date: Tue Feb 24 18:30:20 2004 ip: 66.77.73.32 host: shop-gw.sac.overture.com agent: Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler

I regulary receive this kind of messages, usually created because bad robots or script kiddies access my spam trap. But Yahoo and Overture are well respected companies, and I would assume that they would have respected my robots.txt file, in which I explicitly deny access to the /private folder:

User-agent: * Disallow: /cgi-bin Disallow: /dummy/dummy.html Disallow: /errors Disallow: /fimcap Disallow: /js Disallow: /mailtemplates Disallow: /mt-static Disallow: /private Disallow: /spam

So I looked in my access log and found that they indeed violated my robots file!!

66.77.73.32 - - [24/Feb/2004:17:20:24 -0500] "GET /robots.txt HTTP/1.0" 200 758 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler" 66.77.73.32 - - [24/Feb/2004:17:57:22 -0500] "GET /private/ HTTP/1.0" 200 4815 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler" 66.77.73.32 - - [24/Feb/2004:18:30:20 -0500] "GET /private/welcome.html HTTP/1.0" 200 351 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"

Notice that the page mentioned in the User Agent string states that Yahoo-Overture does support the robots exclusion protocol!

Jeroen Sangers @jeroensangers

← An IndieWeb Webring πŸ•ΈπŸ’ β†’