Using Apache Commons HttpClient to download HTTP data

1. HttpClient

This used to be standalone but is now under the Apache HttpComponents project. The tutorial should give you enough information on how to use it.

At time of writing we are using version 4.1

Usage snippet


package com.magicmonster.sample;

import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.URI;

public class HttpClientSnippet {
    public static void main(String[] args) throws Exception {
        String url = "http://magicmonster.com";
        URI uri = new URI(url);
        HttpGet httpget = new HttpGet(uri);

        HttpClient httpclient = new DefaultHttpClient();

        HttpResponse response = httpclient.execute(httpget);
        // check response headers.
        String reasonPhrase = response.getStatusLine().getReasonPhrase();
        int statusCode = response.getStatusLine().getStatusCode();

        System.out.println(String.format("statusCode: %d", statusCode));
        System.out.println(String.format("reasonPhrase: %s", reasonPhrase));

        HttpEntity entity = response.getEntity();
        InputStream content = entity.getContent();

        ByteArrayOutputStream baos = new ByteArrayOutputStream(1024 * 1024);

        // apache IO util
        try {
            System.out.println("start download");
            IOUtils.copy(content, baos);
        } finally {
            // close http network connection
            content.close();
        }
        System.out.println("end download");
        byte[] bytes = baos.toByteArray();
        System.out.println(String.format("got %d bytes", bytes.length));
        System.out.println("HTML as string:" + new String(bytes));
    }
}
      

2. Maven dependencies

To include the httpclient library into your project, use the following maven dependency:


<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.2.3</version>
</dependency>
    

This is the latest version as of 10 Mar 2013.

3. gzip compression

It is very simple to turn this on in the client. Use a ContentEncodingHttpClient instead of the DefaultHttpClient. e.g. in the snippet above replace the DefaultHttpClient httpclient variable with the following


HttpClient httpclient = new ContentEncodingHttpClient();
    

4. Building URLs

The build up a URL with request parameters, use the fluent URIBuilder. Note the path needs a leading slash.

import org.apache.http.client.utils.URIBuilder;
...
        URI uri = new URIBuilder().setScheme("http").setHost("www.example.com").setPath("/search").setPort(8080).
                setParameter("foo", "bar").setParameter("query", "this is a test").build();
        System.out.println(uri);
    

The above will output:

http://www.example.com:8080/search?foo=bar&query=this+is+a+test