Yahoo-Overture does not respect robots.txt

Today I received the following message in my mailbox:

an improper scan has caused a ban on your site
date: Tue Feb 24 18:30:20 2004
ip: 66.77.73.32
host: shop-gw.sac.overture.com
agent: Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler

I regulary receive this kind of messages, usually created because bad robots or script kiddies access my spam trap. But Yahoo and Overture are well respected companies, and I would assume that they would have respected my robots.txt file, in which I explicitly deny access to the /private folder:

User-agent: *
Disallow: /cgi-bin
Disallow: /dummy/dummy.html
Disallow: /errors
Disallow: /fimcap
Disallow: /js
Disallow: /mailtemplates
Disallow: /mt-static
Disallow: /private
Disallow: /spam

So I looked in my access log and found that they indeed violated my robots file!!

66.77.73.32 – – [24/Feb/2004:17:20:24 -0500] “GET /robots.txt HTTP/1.0” 200 758 “-” “Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler”
66.77.73.32 – – [24/Feb/2004:17:57:22 -0500] “GET /private/ HTTP/1.0” 200 4815 “-” “Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler”
66.77.73.32 – – [24/Feb/2004:18:30:20 -0500] “GET /private/welcome.html HTTP/1.0” 200 351 “-” “Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler”

Notice that the page mentioned in the User Agent string states that Yahoo-Overture does support the robots exclusion protocol!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.