日期:2014-05-17  浏览次数:20815 次

用HtmlUnit和httpClient抓施华洛世奇网站图片和动画<一>

让我们先看看施华洛世奇网站界面http://www.swarovski-crystallized.com/吧,这个页面离我们需要抓去页面有4层菜单,也就是需要点击四次才到达我们的目标页面。这个页面有5个菜单,每个菜单需要深入4层菜单。我们简称为CatchMenu1;经测试,4个线程同时抓去需要5个多小时分析



?第三层菜单界面,我们简称为CatchMenu3,注意这个页面下面还有分页哦:



?终于到达我们的目标页面了:



?

好了,需求就是这么简单,开工吧。

1.建立一个菜单模型类:

public class CatchMenuDO implements Serializable
{
?private Long id;
?private Long parentId;//该菜单的父菜单ID
?private String name; //该菜单名称
?private String name_url; //菜单对应的页面
??? private Integer level; //第几层菜单,1表示第一层,2表示第二层,依次类推
??? private Date gmCreate;
??? private Date gmModify;
??? private Boolean fetched=false; //是否已经抓去分析了
??? private Boolean isLastChild=false; //是否是最小叶菜单,即最后一层的菜单
??? private String parentName;//父菜单名称

...................................................................................////////////省略

.....................................................................................

}

?2.建立连接数据库获取保存菜单以及产品信息DAO类:

public class SwaroSkiDAOImpl extends SqlMapClientDaoSupport implements SwaroSkiDAO
{
?public int addProduct(Product product)
?{
??getSqlMapClientTemplate().insert("SwaroSkiDAO.addProduct", product);
???? return 1;
?}
?public Long addProductColor(ProductColor color)
?{
??Long result= (Long) getSqlMapClientTemplate().insert("SwaroSkiDAO.addProductColor", color);
???? return result;
?}
?public Long addProductImageSwf(ProductImageSwf imageswf)
?{
??Long result= (Long) getSqlMapClientTemplate().insert("SwaroSkiDAO.addProductImageSwf", imageswf);
return result;
?}


?public Long addProductSize(ProductSize size)
?{
??Long result= (Long) getSqlMapClientTemplate().insert("SwaroSkiDAO.addProductSize", size);
???? return result;
?}
?public List<CatchMenuDO> getCatchMenuList(CatchMenuQueryModel menuModel)
?{
??List<CatchMenuDO> list= (List<CatchMenuDO>)getSqlMapClientTemplate().queryForList("SwaroSkiDAO.getCatchMenu", menuModel);??
??return list;
?}

}

3.数据库SQL:

?<select id="SwaroSkiDAO.getCatchMenu"? resultMap="CatchMenuDO" parameterClass="xxxx.xxxx.xxxx.CatchMenuQueryModel">

//这个sql的意思就是获取指定层,指定父类菜单下面所有子菜单
????SELECT? ID, parent_ID, name, name_url,
??? menu_level, gm_create,gm_modify,fetched,
??? is_last_child,parent_name
??? FROM? page_catch_menu
??? where 1=1
???????????? <isNotNull property="level" prepend="? and? ">
???????????????????????????? menu_level =#level#
???????????? </isNotNull>
???????????? <isNotNull property="parentIdList" prepend="? and? ">??
??????????????? parent_ID IN???
??????????????? <iterate property="parentIdList" open="(" close=")" conjunction=",">??
???????????????????? #parentIdList[]#??
??????????????? </iterate>??
</isNotNull>
?</select>

?

5.终于开始正式开工了:

public class CatchManagerImpl implements ICatchManager

{

?private Swaro