Using Apache Commons HttpClient to download HTTP data

The Apache HttpClient library makes accessing and downloading HTTP data easy

Published: Saturday, 7 May 2011
Last modified: Monday, 26 November 2012

HttpClient

This used to be standalone but is now under the Apache HttpComponents project. The tutorial should give you enough information on how to use it.

At time of writing we are using version 4.1

Usage snippet

package com.magicmonster.sample;

import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.URI;

public class HttpClientSnippet {
    public static void main(String[] args) throws Exception {
        String url = "http://magicmonster.com";
        URI uri = new URI(url);
        HttpGet httpget = new HttpGet(uri);

        HttpClient httpclient = new DefaultHttpClient();

        HttpResponse response = httpclient.execute(httpget);
        // check response headers.
        String reasonPhrase = response.getStatusLine().getReasonPhrase();
        int statusCode = response.getStatusLine().getStatusCode();

        System.out.println(String.format("statusCode: %d", statusCode));
        System.out.println(String.format("reasonPhrase: %s", reasonPhrase));

        HttpEntity entity = response.getEntity();
        InputStream content = entity.getContent();

        ByteArrayOutputStream baos = new ByteArrayOutputStream(1024 * 1024);

        // apache IO util
        try {
            System.out.println("start download");
            IOUtils.copy(content, baos);
        } finally {
            // close http network connection
            content.close();
        }
        System.out.println("end download");
        byte[] bytes = baos.toByteArray();
        System.out.println(String.format("got %d bytes", bytes.length));
        System.out.println("HTML as string:" + new String(bytes));
    }
}

Maven dependencies

To include the httpclient library into your project, use the following maven dependency:

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.2.3</version>
</dependency>

This is the latest version as at 10 Mar 2013.

gzip compression

To turn this on in the client, use a ContentEncodingHttpClient instead of the DefaultHttpClient. e.g. in the snippet above replace the DefaultHttpClient httpclient variable with the following

HttpClient httpclient = new ContentEncodingHttpClient();

Building URLs

The build a URL with request parameters, use the fluent URIBuilder. Note the path needs a leading slash.

import org.apache.http.client.utils.URIBuilder;
...

URI uri = new URIBuilder()
            .setScheme("http")
            .setHost("www.example.com")
            .setPath("/search")
            .setPort(8080)
            .setParameter("foo", "bar")
            .setParameter("query", "this is a test")
            .build();
System.out.println(uri);

The above will output:

http://www.example.com:8080/search?foo=bar&query=this+is+a+test