Concatenate a plain text string and binary data

Question

My goal is create a HTTP request (headers and body) manually. It has to look like this:

Some-Header1: some value1
Some-Header2: some value2
Some-Header3: some value3

-------------MyBoundary
Content-Disposition: form-data; name="file_content_0"; filename="123.pdf"
Content-Length: 93
Content-Type: application/pdf
Content-Transfer-Encoding: binary

  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====

-------------MyBoundary--

I've found out that this is the only way to send a file to a web service through its API because I sniffed the traffic of a script in Ruby doing that and the it turned out to look like I've shown above.

So the headers such as "Some-Header1" and other - are the plain text headers. Notice that there is also "-------------MyBoundary--" after "==== here is the binary data of 123.pdf ===="

But "==== here is the binary data of 123.pdf ====" is binary data.

The question is, how do I chain (combine) the plain text data with the binary data?

P.S. I've been trying to achieve this by the standard libraries such a python-requests and failed. I don't consider using them again at this point. For now I only need to know how to combine the plain text and binary data.

UPDATE:

How can I easily embed a binary data to a string?

import textwrap

body_headers = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.c"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

                    %b ??? -> to indicate that a binary data will be placed here

    -------------MyBoundary--


    """
) % binary_data" #???

UPDATE2:

text1 = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.pdf"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

    replace_me

    -------------MyBoundary--


    """
)

with open("test1.pdf", "rb") as file_hander:
    binary_data = file_hander.read()

print (isinstance(binary_data, str)) # True
print (isinstance("replace_me", str)) # True

print text1.replace("replace_me", binary_data) # --> [Decode error - output not utf-8]

print text1.replace("replace_me", binary_data).encode("utf-8") # exception

Error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 195: ordinal not in range(128)

And this also gives me an exception:

print unicode(text1.replace("replace_me", binary_data), "utf-8")
# UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 195: invalid continuation byte

@hlt, I just told you what the problem I'd run into. Code? str1 = headers; str_total = str1 + open(file_name, "rb"). --> doesn't compile. — Incerteza
– Incerteza, Commented Aug 20, 2014 at 7:12
Thanks. (Please add this into the post, code in comments is a pain to find) — hlt
– hlt, Commented Aug 20, 2014 at 7:13

hlt · Accepted Answer · 2014-08-20 09:12:34Z

5

To load binary data from a file, you would do

with open(file_name, 'rb') as the_file:
    binary_data = the_file.read()

Now, there are two scenarios, depending on your Python version:

Python 2 - `unicode` and `str`

binary_data will be a str, concatenation should work perfectly fine unless your other string is unicode, in which case you probably should encode that (nearly no networking function requires unicode in Python 2):

normal_str = unicode_str.encode(encoding)

where encoding usually is something like "utf-8", "utf-16" or "latin-1", but it may be more exotic.

Python 3 - `str` and `bytes`

binary_data will be a bytes object, which you cannot simply concatenate with a default str. If whatever you use to send the data requires bytes, you follow the same encoding approach as with Python 2. If it requires str (which for networking purposes is probably unlikely), you must decode the given encoding (as this is nearly impossible to guess, you should check what encoding your file uses) using

normal_str = byte_str.decode(encoding)

again passing the encoding as an argument (hint: "latin-1" should be fine, as it preserves the bytes, while others, like "utf-8", may fail on actual binary data (that is not encoded strings) [HT to @SergeBallesta])

To avoid this kind of trouble in Python 3, you may wish to define your headers as bytes from the beginning using something = b"whatever" instead of something = "whatever" (note the added b) and open other input files to the header as binary as well. Then, simply concatenating the strings using + should not be a problem.

Sending the HTTP request

To send this kind of raw data to the server, you have different options:

If you want more control than urllib (or urllib2) and requests give you, you can do low-level networking with raw sockets to send any data you like with socket (the example in the docs is a good example of how to implement this)
You could pass the data (everything between and including ---(snip)--MyBoundary) as request data to a POST request (if your HTTP request is one, which is not specified in the question) using urllib or requests

Efficiency

If you opt for raw sockets and send very large files, you may want to read the file in chunks (using the_file.read(number_of_bytes)) and write it directly to the socket (using the_socket.send(read_binary_data)). [HT to @Teudimundo]

Re: Update

Regarding the update (which really should be a new question...): There is no format string syntax (neither new ("{}"), nor old ("%s")) for bytes. You need to use decode on the bytes object to turn it into a string and use format strings properly (or turn the string into bytes with encode and use normal concatenation instead). Also note that textwrap.dedent does not work on bytes, because regular expressions do not operate on bytes in Python.

edited Aug 20, 2014 at 9:12

answered Aug 20, 2014 at 7:35

hlt

6,3153 gold badges25 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Teudimundo Over a year ago

I was writing the same kind of answer, but I would add that instead of adding all binary data to a string, from a memory usage point of view, to read binary data in chunks and write them directly to the socket.

Incerteza Over a year ago

I have both Python2 and 3 and I've been using python-requests. Will it work if send a post like this: session.request(....., data = binary_data_with_headers)?

hlt Over a year ago

I'm not usually using requests, but I think you could do something with session.post(....., data=the_data) if your request is a POST request (make sure it is not encoded somehow). I'm not certain about what to do with GET (and other) requests. Personally, I would use raw sockets for this kind of thing

hlt Over a year ago

Updated. However, consider asking such follow-up questions as separate questions instead of editing your question (it makes questions and answers easier to follow for others etc.)

Serge Ballesta Over a year ago

Nice answer, but I would'nt dare converting binary bytes to unicode with a multi-byte encoding as utf-8 ! It looks ok using latin1 with python 3.4 but smells like using an undocumented feature. I tried bytes(range(256)).decode() and it breaks with UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

|

Collectives™ on Stack Overflow

Concatenate a plain text string and binary data

1 Answer 1

Python 2 - `unicode` and `str`

Python 3 - `str` and `bytes`

Sending the HTTP request

Efficiency

Re: Update

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Python 2 - unicode and str

Python 3 - str and bytes

Sending the HTTP request

Efficiency

Re: Update

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Python 2 - `unicode` and `str`

Python 3 - `str` and `bytes`