Skip to content

Using Apache Commons HttpClient to download HTTP data

This used to be standalone but is now under the Apache HttpComponents project. The tutorial should give you enough information on how to use it.

At time of writing we are using version 4.1

package com.magicmonster.sample;
import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.URI;
public class HttpClientSnippet {
public static void main(String[] args) throws Exception {
String url = "http://magicmonster.com";
URI uri = new URI(url);
HttpGet httpget = new HttpGet(uri);
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = httpclient.execute(httpget);
// check response headers.
String reasonPhrase = response.getStatusLine().getReasonPhrase();
int statusCode = response.getStatusLine().getStatusCode();
System.out.println(String.format("statusCode: %d", statusCode));
System.out.println(String.format("reasonPhrase: %s", reasonPhrase));
HttpEntity entity = response.getEntity();
InputStream content = entity.getContent();
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024 * 1024);
// apache IO util
try {
System.out.println("start download");
IOUtils.copy(content, baos);
} finally {
// close http network connection
content.close();
}
System.out.println("end download");
byte[] bytes = baos.toByteArray();
System.out.println(String.format("got %d bytes", bytes.length));
System.out.println("HTML as string:" + new String(bytes));
}
}

To include the httpclient library into your project, use the following maven dependency:

<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.2.3</version>
</dependency>

This is the latest version as at 10 Mar 2013.

To turn this on in the client, use a ContentEncodingHttpClient instead of the DefaultHttpClient. e.g. in the snippet above replace the DefaultHttpClient httpclient variable with the following

HttpClient httpclient = new ContentEncodingHttpClient();

The build a URL with request parameters, use the fluent URIBuilder. Note the path needs a leading slash.

import org.apache.http.client.utils.URIBuilder;
...
URI uri = new URIBuilder()
.setScheme("http")
.setHost("www.example.com")
.setPath("/search")
.setPort(8080)
.setParameter("foo", "bar")
.setParameter("query", "this is a test")
.build();
System.out.println(uri);

The above will output:

http://www.example.com:8080/search?foo=bar&query=this+is+a+test