日期:2014-05-16 浏览次数:20964 次
1.spider,各式各样的spider,就像海里的游鱼
有大的,有小的
2.各类探测http代理的spider,比如这种日志
60.173.14.85 - - [03/Sep/2013:09:59:25 +0800] "GET http://www.qq.com/ HTTP/1.1" 200 612 "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)"
一开始很不解,怎么会有这种记录呢?
后来查了许多资料,发现是用了http代理才会出现这种头,咱们来坐个实验
import urllib proxies = {'http':'http://www.reco.cn:80'} fp = urllib.urlopen('http://www.qq.com',proxies=proxies) fp.read()
203.110.174.151 - - [03/Sep/2013:11:19:57 +0800] "GET http://www.qq.com HTTP/1.0" 200 65530 "-" "Python-urllib/1.17"
3.奇怪的空记录,如下
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
203.110.174.151 - - [03/Sep/2013:11:47:00 +0800] "-" 400 0 "-" "-"
这些咋产生的?经过搜索取证,发现,这是chrome的问题,chrome链接网站会多开几个并发,也可以说是连接池,有时候,用不完,就会断掉,就是这个样子了
所以不要怀疑服务器是不是处bug了