4

i'm actually trying to do something that i do not know if its ok.

Problem:

I have a web client and a web server, the server (written in python with flask) processes a pdf file in order to get some data, and the client just send the pdf file and waits for the response. The think is that the client can send various pdf files to process and what i want to do is, to send all the pdfs from the client to the server in just one request.

What I have planned to do:

I was thinking on convert the Blob of each pdf in a String and send a POST Request with a JSON body like this:

BODY:
  {
    "content":[
        {"name": "pdf_name_1.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_2.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_3.pdf", "data": "some blob data converted to string"},
        ...
    ]
}

So then in the server i was thinking to convert again the data into a blob(bytes) in order to write down the pdf a start the processing the data.

My question:

Is there any way to convert the str representation of the pdf to bytes in order to write down in disk the pdf with python?

Thanks a lot, if some one come up with another idea to send bunch of pdfs in only one request let me know please.

pd: I'm using python 3.5 and Flask for the web server.

2

1 Answer 1

1

In such cases, it's preferred to send file data passing that with the files keyword, like so:

import requests


def send_pdf_data(filename_list, encoded_pdf_data):
    files = {}

    for (filename, encoded, index) in zip(filename_list, encoded_pdf_data, range(len(filename_list))):
        files[f"pdf_name_[index].pdf"] = (filename, open(filename, 'rb'), 'application/pdf')

    data = {}
    # *Put whatever you want in data dict*

    requests.post("http://yourserveradders", data=data, files=files)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [open(filename, 'wb').read() for filename
                     in filename_list]

if __name__ == '__main__':
    main()

However, if you really want to pass data as json, you should use base-64 module as @Mark Ransom mentioned.

You can implement it in this way:

import requests
import json
import base64


def encode(data: bytes):
    """
    Return base-64 encoded value of binary data.
    """
    return base64.b64encode(data)


def decode(data: str):
    """
    Return decoded value of a base-64 encoded string.
    """
    return base64.b64decode(data.encode())


def get_pdf_data(filename):
    """
    Open pdf file in binary mode,
    return a string encoded in base-64.
    """
    with open(filename, 'rb') as file:
        return encode(file.read())


def send_pdf_data(filename_list, encoded_pdf_data):
    data = {}
    # *Put whatever you want in data dict*
    # Create content dict.
    content = [dict([("name", filename), ("data", pdf_data)])
               for (filename, data) in zip(filename_list, encoded_pdf_data)]
    data["content"] = content

    data = json.dumps(data) # Convert it to json.
    requests.post("http://yourserveradders", data=data)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [get_pdf_data(filename) for filename
                     in filename_list]

if __name__ == '__main__':
    main()
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, I think its better to send a JSON with the string, than send all the files (the size of the files can be greater than 1mb and also the quantity of the files can be greater than 1000)
Another suggestion: you could send more requests at the same time, each one with some pdf file in json. It would speed up your code considerably. If you're interested, you may give a look here: docs.python.org/3/library/multiprocessing.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.