2

If I have the following binary: <<"GET http://www.google.com HTTP/1.1">>, how can I split it up so that I can retrieve only the host (http://www.google.com) ?

I started with something like:

get_host(<<$G, Rest/binary>>) -> get_host(Rest);
get_host(<<$E, Rest/binary>>) -> get_host(Rest);
get_host(<<$T, Rest/binary>>) -> get_host(Rest);

but I am not sure how to go on from here. I was thinking of reversing Rest and starting over from the end of the binary.

3
  • Well, why try? is it stream and not a fixed binary? (Like you don't now when the GET will show up). Otherwise split the binary. Commented Mar 25, 2011 at 10:58
  • 1
    wagerlabs.com/parsing-text-and-binary-files-with-erlang?c=1 Commented Mar 25, 2011 at 11:06
  • @Jonke: it's actually the binary coming out of a browser's request; the above binary is only a small part of all the headers that a browser sends. Commented Mar 25, 2011 at 11:14

3 Answers 3

10

It seems you're trying to implement a minimal parser for HTTP 1.1. This is one solution that does follow the specifications for HTTP 1.1 and parses the first line of a http request. Without knowing your specific situation I would in most cases recommend using a generic HTTP parser before a simplified "split binary" or similar.

1> erlang:decode_packet(http,<<"GET http://www.google.com HTTP/1.1\n">>,[]).  
{ok,{http_request,'GET',
              {absoluteURI,http,"www.google.com",undefined,"/"},
              {1,1}},
<<>>}
Sign up to request clarification or add additional context in comments.

Comments

3

I would recommend erlang:decode_packet for this, but to show how it can be done, here is a pair of functions that strips the leading "GET " and then returns everything up to the first space (but crashes if there is no space).

get_host(<<"GET ", Rest/binary>>) ->
    get_host2(Rest, <<>>).

get_host2(<<" ", _/binary>>, Acc) ->
    Acc;
get_host2(<<C, Rest/binary>>, Acc) ->
    get_host2(Rest, <<Acc/binary, C>>).

Basically, I put each byte that is not a space into my "accumulator", and when I find the space I return my accumulator. This is a common trick that is more often seen with lists. (With lists you will want to put new items at the front of the list and reverse the list at the end, to avoid your O(N) algorithm turning into O(N²), but that is not needed for binaries.)

Comments

0

The simple answer (but probably not what you really is asking for)

B = <<"GET http://www.google.com HTTP/1.1">> .
{_,H}=split_binary(B,4). 
split_binary(H,21).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.