9

I'm trying to parse a byte[] in java, which is a representation of an HTTP response. There is this question Is there any simple http response parser for Java?, which is exactly my question, but the accepted answer doesn't help me. If I look at http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/io/HttpMessageParser.html, I do not understand how this will help me.

6
  • What mechanism is providing you with this byte array? What method are you using to actually communicate with the HTTP server? Commented Oct 24, 2014 at 14:31
  • The data is coming from WARC files, collected with a webcrawler. I know there's a library that parses the whole WARC file, but I'm using it with this Hadoop mapper github.com/ept/warc-hadoop that uses it's own WARCRecord format. There are multiple routes around this, but I thought parsing an HTTP response should be doable. Commented Oct 24, 2014 at 14:35
  • The docs you linked say "This library currently doesn't perform any parsing of the data inside records, such as the HTTP headers or the HTML body. You can simply read the server's response as an array of bytes. Additional parsing functionality may be added in future versions." -- Does that mean that the byte array can just be used to create a String that shows the textual HTTP response? Commented Oct 24, 2014 at 14:40
  • Yes, exactly. You'd get something like HTTP/1.1 301 Moved Permanently Alternate-Protocol: 80:quic,p=0.01 Cache-Control: public, max-age=2592000 Content-Length: 218 Content-Type: text/html; charset=UTF-8 Date: Fri, 24 Oct 2014 14:43:20 GMT Expires: Sun, 23 Nov 2014 14:43:20 GMT Location: http://www.google.nl/ Server: gws X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> Commented Oct 24, 2014 at 14:44
  • 1
    Yes, that's it. Thanks for your help. I'll try and do this or find some other route. Not to take this out on anyone, but REALLY? asset-3.soup.io/asset/2905/6018_3568_450.jpeg. Commented Oct 24, 2014 at 15:05

2 Answers 2

11

I hope this should get you started

String s = "HTTP/1.1 200 OK\r\n" +
        "Content-Length: 100\r\n" +
        "Content-Type: text/plain\r\n" +
        "Server: some-server\r\n" +
        "\r\n";
SessionInputBufferImpl sessionInputBuffer = new SessionInputBufferImpl(new HttpTransportMetricsImpl(), 2048);
sessionInputBuffer.bind(new ByteArrayInputStream(s.getBytes(Consts.ASCII)));
DefaultHttpResponseParser responseParser = new DefaultHttpResponseParser(sessionInputBuffer);
HttpResponse response = responseParser.parse();
System.out.println(response);

This code produces the following output:

HTTP/1.1 200 OK [Content-Length: 100, Content-Type: text/plain, Server: some-server]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! This does get me started.
FYI these classes are from Apache HttpCore
0

Check this out: https://github.com/ipinyol/proxy-base

This is a simple highly configurable http proxy. The method readHeader of the class org.mars.proxybase.ProxyThread parses the http headers given a DataInputStream (which reads by bytes) and returns an object of type Header with information regarding the header.

Also, you probably know that either you have a content-length define in the header or you have chunked data that you must read by chunks in the http response. The methods readContent and readContentByChunk of the same class perform the reading of the content. You can explore your self the code and modify accordingly.

1 Comment

Thanks. I hope there's a less work-intensive way, but might try and do this if there's nothing else.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.