日期:2014-05-17  浏览次数:20789 次

HtmlUnit(一)Fix File handler bug
HtmlUnit(一)Fix File handler bug

MainPage
http://htmlunit.sourceforge.net/

document
http://www.w3.org/TR/html401/interact/forms.html#adef-tabindex

Our version is htmlunit2.6.  After we used this opensource project, we came across a problem about 'file handler leak'.
Our system will throw 'too many open files' exception, and we are sure that it is caused by the TCP status CLOSE_WAIT. Our TCP status CLOSE_WAIT will increase from jboss start, and
it will never stop to a stable status.

The log from the server told us there were many connections URLs like this:
TCP yo-in-f190.ie100.net:http (CLOSE_WAIT)
TCP iad04s01-in-f99.ie100.net:http (CLOSE_WAIT)
TCP a72-246-208-9.deploy.akamaitechnologies.com:http (CLOSE_WAIT)
TCP a72-246-113-163.deploy.akamaitechnologies.com:http (CLOSE_WAIT)

we have done some changes about the server settings.
1. We modified the configuration of TCP_KEEPALIVE_TIME to a short value 1800 seconds.
I changed /etc/sysctl.conf by adding the following lines in the /etc/sysctl.conf and restart the newwork on my test Linux server.
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_intvl = 2
2. We increased the ulimit=8000 to handle more files.

But that did work well. It can not fix the problem. After that, we think it is the problem of htmlunit itself.

We improved some source codes of htmlunit2.6. Our changes follow:
1.configuration file
src/main/java/com/gargoylesoftware/htmlunit/http_connection_pool.properties, some configurations:
DEFAULT_MAX_CONNECTIONS_PER_HOST=50
#Timeout in milliseconds
CONNECTION_TIMEOUT=300000
SO_TIMEOUT=300000
MAX_TOTAl_CONNECTIONS=500
#RECEIVE_BUFFER_SIZE=65535
#SEND_BUFFER_SIZE=65535
DEFAULT_MAX_CONNECTIONS_PER_HOST=60000
IDLE_TIMEOUT=30000

2.Web Connection class
src/main/java/com/gargoylesoftware/htmlunit/HttpWebConnection.java
modify the method of create connection to make MultiThreadedHttpConnectionManager singleton in our system.
protected HttpClient createHttpClient(){
// final MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
final MultiThreadedHttpConnectionManager connectionManager = com.gargoylesoftware.htmlunit.MultiThreadedHttpConnectionManagerFactory
                .getInstance();
HttpClient client = new HttpClient(connectionManager);
HostConfiguration hostConf = client.getHostConfiguration();
List<Header> headers = new ArrayList<Header>();
headers.add(new Header("Connection", "close"));
hostConf.getParams().setParameter("http.default-headers", headers);
return client;
}

3.Factory mode class to create MultiThreadedHttpConnectionManager
src/main/java/com/gargoylesoftware/htmlunit/MultiThreadedHttpConnectionManagerFactory.java:

package com.gargoylesoftware.htmlunit;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.params.HttpConnectionManagerParams;
import org.apache.commons.httpclient.util.IdleConnectionTimeoutThread;

public class MultiThreadedHttpConnectionManagerFactory
{
    private static MultiThreadedHttpConnectionManager instance;

    public static MultiThreadedHttpConnectionManager getInstance()
    {
        InputStream is = null;
        HttpConnectionManagerParams param = null;
  &nb