1

My goal is create a HTTP request (headers and body) manually. It has to look like this:

Some-Header1: some value1
Some-Header2: some value2
Some-Header3: some value3

-------------MyBoundary
Content-Disposition: form-data; name="file_content_0"; filename="123.pdf"
Content-Length: 93
Content-Type: application/pdf
Content-Transfer-Encoding: binary

  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====

-------------MyBoundary--

I've found out that this is the only way to send a file to a web service through its API because I sniffed the traffic of a script in Ruby doing that and the it turned out to look like I've shown above.

So the headers such as "Some-Header1" and other - are the plain text headers. Notice that there is also "-------------MyBoundary--" after "==== here is the binary data of 123.pdf ===="

But "==== here is the binary data of 123.pdf ====" is binary data.

The question is, how do I chain (combine) the plain text data with the binary data?

P.S. I've been trying to achieve this by the standard libraries such a python-requests and failed. I don't consider using them again at this point. For now I only need to know how to combine the plain text and binary data.

UPDATE:

How can I easily embed a binary data to a string?

import textwrap

body_headers = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.c"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

                    %b ??? -> to indicate that a binary data will be placed here

    -------------MyBoundary--


    """
) % binary_data" #???

UPDATE2:

text1 = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.pdf"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

    replace_me

    -------------MyBoundary--


    """
)

with open("test1.pdf", "rb") as file_hander:
    binary_data = file_hander.read()

print (isinstance(binary_data, str)) # True
print (isinstance("replace_me", str)) # True

print text1.replace("replace_me", binary_data) # --> [Decode error - output not utf-8]

print text1.replace("replace_me", binary_data).encode("utf-8") # exception

Error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 195: ordinal not in range(128)

And this also gives me an exception:

print unicode(text1.replace("replace_me", binary_data), "utf-8")
# UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 195: invalid continuation byte
2
  • @hlt, I just told you what the problem I'd run into. Code? str1 = headers; str_total = str1 + open(file_name, "rb"). --> doesn't compile. Commented Aug 20, 2014 at 7:12
  • 1
    Thanks. (Please add this into the post, code in comments is a pain to find) Commented Aug 20, 2014 at 7:13

1 Answer 1

5

To load binary data from a file, you would do

with open(file_name, 'rb') as the_file:
    binary_data = the_file.read()

Now, there are two scenarios, depending on your Python version:

Python 2 - unicode and str

binary_data will be a str, concatenation should work perfectly fine unless your other string is unicode, in which case you probably should encode that (nearly no networking function requires unicode in Python 2):

normal_str = unicode_str.encode(encoding)

where encoding usually is something like "utf-8", "utf-16" or "latin-1", but it may be more exotic.

Python 3 - str and bytes

binary_data will be a bytes object, which you cannot simply concatenate with a default str. If whatever you use to send the data requires bytes, you follow the same encoding approach as with Python 2. If it requires str (which for networking purposes is probably unlikely), you must decode the given encoding (as this is nearly impossible to guess, you should check what encoding your file uses) using

normal_str = byte_str.decode(encoding)

again passing the encoding as an argument (hint: "latin-1" should be fine, as it preserves the bytes, while others, like "utf-8", may fail on actual binary data (that is not encoded strings) [HT to @SergeBallesta])

To avoid this kind of trouble in Python 3, you may wish to define your headers as bytes from the beginning using something = b"whatever" instead of something = "whatever" (note the added b) and open other input files to the header as binary as well. Then, simply concatenating the strings using + should not be a problem.

Sending the HTTP request

To send this kind of raw data to the server, you have different options:

  • If you want more control than urllib (or urllib2) and requests give you, you can do low-level networking with raw sockets to send any data you like with socket (the example in the docs is a good example of how to implement this)
  • You could pass the data (everything between and including ---(snip)--MyBoundary) as request data to a POST request (if your HTTP request is one, which is not specified in the question) using urllib or requests

Efficiency

If you opt for raw sockets and send very large files, you may want to read the file in chunks (using the_file.read(number_of_bytes)) and write it directly to the socket (using the_socket.send(read_binary_data)). [HT to @Teudimundo]

Re: Update

Regarding the update (which really should be a new question...): There is no format string syntax (neither new ("{}"), nor old ("%s")) for bytes. You need to use decode on the bytes object to turn it into a string and use format strings properly (or turn the string into bytes with encode and use normal concatenation instead). Also note that textwrap.dedent does not work on bytes, because regular expressions do not operate on bytes in Python.

Sign up to request clarification or add additional context in comments.

8 Comments

I was writing the same kind of answer, but I would add that instead of adding all binary data to a string, from a memory usage point of view, to read binary data in chunks and write them directly to the socket.
I have both Python2 and 3 and I've been using python-requests. Will it work if send a post like this: session.request(....., data = binary_data_with_headers)?
I'm not usually using requests, but I think you could do something with session.post(....., data=the_data) if your request is a POST request (make sure it is not encoded somehow). I'm not certain about what to do with GET (and other) requests. Personally, I would use raw sockets for this kind of thing
Updated. However, consider asking such follow-up questions as separate questions instead of editing your question (it makes questions and answers easier to follow for others etc.)
Nice answer, but I would'nt dare converting binary bytes to unicode with a multi-byte encoding as utf-8 ! It looks ok using latin1 with python 3.4 but smells like using an undocumented feature. I tried bytes(range(256)).decode() and it breaks with UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.