PDF data stream with python

Question

Context: My code fetches a set of coordinates from some png documents and later on performs some redaction in certain fields (it uses these coordinates for drawing rectangles in certain areas).

I want my final output to be a pdf with each redacted image as page. I can achieve this with fpdf package with no problem.

However, I intend to send this pdf file as email (base64 encoded) attachment. Is there any way to get the base64 string from fpdf output?

On top of that, can I use image binary string in fpdf image method?

See the redact_pdf method below (I placed some comments there to be more clear)

Code:

class Redaction:
def __init__(self,png_image_list,df_coordinates):
    self.png_image_list = png_image_list
    self.df_coordinates = df_coordinates
    
def _redact_images(self):
    redacted_images_bin = []
    for page_num,page_data in enumerate(self.png_image_list):
        im_page = Image.open(io.BytesIO(page_data))
        draw = ImageDraw.Draw(im_page)
        df_filtered = self.df_coordinates[self.df_coordinates['page_number'] == page_num+1]
        for index, row in df_filtered.iterrows():
            x0 = row['x0'] * im_page.size[0]
            y0 = row['y0'] * im_page.size[1]
            x1 = row['x1'] * im_page.size[0]
            y1 = row['y1'] * im_page.size[1]
            x2 = row['x2'] * im_page.size[0]
            y2 = row['y2'] * im_page.size[1]
            x3 = row['x3'] * im_page.size[0]
            y3 = row['y3'] * im_page.size[1]
            coords = [x0,y0,x1,y1,x2,y2,x3,y3]
            draw.polygon(coords,outline='blue',fill='yellow')
        redacted_images_bin.append(im_page)
        
    return redacted_images_bin

def redacted_pdf(self):
    redacted_images = self._redact_images()
    pdf = FPDF()
    pdf.set_auto_page_break(0)
    for index,img_redacted in enumerate(redacted_images):
        img_redacted.save(f"image_{index}.png")
        pdf.add_page()
        pdf.image(f"image_{index}.png",w=210,h=297)
        os.remove(f"image_{index}.png") # I would like to avoid file handling!
    pdf.output("doc.pdf","F") # I would like to avoid file handling!
    #return pdf #this is what I want, to return the pdf as base64 or binary

if you save in file then you can read it as bytes and use standard module base64 -ie. base64.b64encode(). You may also use standard io.BytesIO() to create file in memory and then you don't have to save file on hard drive. The same way you may use io.BytesIO() to create file image in memory and use it instead of file on hard drive. Many functions may read/write io.BytesIO() (file-like object) instead of filename — furas
– furas, Commented Feb 17, 2022 at 20:02
on Stackoverflow you may find questions which show how to use io.BytesIO and base64 to send image (or other file) from Flask to HTML/JavaScript. — furas
– furas, Commented Feb 17, 2022 at 20:04

furas · Accepted Answer · 2022-02-18 13:28:51Z

1

In documentation I found that you can get PDF as string using

pdf_string = pdf.output(dest='S')

so you can use standard module base64

import fpdf
import base64

pdf = fpdf.FPDF()

# ... add some elements ...

pdf_string = pdf.output(dest='S')
pdf_bytes  = pdf_string.encode('utf-8')

base64_bytes  = base64.b64encode(pdf_bytes)
base64_string = base64_bytes.decode('utf-8')

print(base64_string)

Result:

JVBERi0xLjMKMyAwIG9iago8PC9UeXBlIC9QYWdlCi9QYXJlbnQgMSAwIFIKL1Jlc291cmNlcyAyIDAgUgovQ29udGVudHMgNCAwIFI+PgplbmRvYmoKNCAwIG9iago8PC9GaWx0ZXIgL0ZsYXRlRGVjb2RlIC9MZW5ndGggMTk+PgpzdHJlYW0KeMKcM1LDsMOiMsOQMzVXKMOnAgALw7wCEgplbmRzdHJlYW0KZW5kb2JqCjEgMCBvYmoKPDwvVHlwZSAvUGFnZXMKL0tpZHMgWzMgMCBSIF0KL0NvdW50IDEKL01lZGlhQm94IFswIDAgNTk1LjI4IDg0MS44OV0KPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1Byb2NTZXQgWy9QREYgL1RleHQgL0ltYWdlQiAvSW1hZ2VDIC9JbWFnZUldCi9Gb250IDw8Cj4+Ci9YT2JqZWN0IDw8Cj4+Cj4+CmVuZG9iago1IDAgb2JqCjw8Ci9Qcm9kdWNlciAoUHlGUERGIDEuNy4yIGh0dHA6Ly9weWZwZGYuZ29vZ2xlY29kZS5jb20vKQovQ3JlYXRpb25EYXRlIChEOjIwMjIwMjE3MjExMDE3KQo+PgplbmRvYmoKNiAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMSAwIFIKL09wZW5BY3Rpb24gWzMgMCBSIC9GaXRIIG51bGxdCi9QYWdlTGF5b3V0IC9PbmVDb2x1bW4KPj4KZW5kb2JqCnhyZWYKMCA3CjAwMDAwMDAwMDAgNjU1MzUgZiAKMDAwMDAwMDE3NSAwMDAwMCBuIAowMDAwMDAwMjYyIDAwMDAwIG4gCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA4NyAwMDAwMCBuIAowMDAwMDAwMzU2IDAwMDAwIG4gCjAwMDAwMDA0NjUgMDAwMDAgbiAKdHJhaWxlcgo8PAovU2l6ZSA3Ci9Sb290IDYgMCBSCi9JbmZvIDUgMCBSCj4+CnN0YXJ0eHJlZgo1NjgKJSVFT0YK

As for image(): it needs filename (or url) and it can't work with string or io.BytesIO().

Eventually you may get source code and you can try to change it.

There is even request on GitHub: Support for StringIO objects as images

EDIT:

I found that there is fork fpdf2 which can use pillow.Image in image() - see fpdf2 Image

And in source code I found image() can also work with io.BytesIO()

Example code for fpdf2 (output() gives bytes instead of string)

import fpdf
import base64
from PIL import Image
import io

#print(fpdf.__version__)

pdf = fpdf.FPDF()

pdf.add_page()

pdf.image('lenna.png')

pdf.image('https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png')

f = open('lenna.png', 'rb')
pdf.image(f)

f = Image.open('lenna.png')
pdf.image(f)

f = open('lenna.png', 'rb')
b = io.BytesIO()
b.write(f.read())
pdf.image(b)

# save in file
pdf.output('output.pdf')

# get as bytes
pdf_bytes = pdf.output()

#print(pdf_bytes)

base64_bytes  = base64.b64encode(pdf_bytes)
base64_string = base64_bytes.decode('utf-8')

print(base64_string)

Wikipedia: Lenna [image]

Test for writing in fpdf2

import fpdf   

pdf = fpdf.FPDF()

pdf.add_page()
pdf.image('https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png')

# --- test 1 ---

pdf.output('output-test-1.pdf')

# --- test 2 ---

pdf_bytes = pdf.output()

with open('output-test-2.pdf', 'wb') as f:  # it will close automatically
    f.write(pdf_bytes)

# --- test 2 ---

pdf_bytes = pdf.output()

f = open('output-test-3.pdf', 'wb')
f.write(pdf_bytes)
f.close()  # don't forget to close when you write

edited Feb 18, 2022 at 13:28

answered Feb 17, 2022 at 20:12

furas

149k12 gold badges121 silver badges171 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Andoni Over a year ago

For some reason performing this: pdf_string = pdf.output(dest='S') pdf_bytes = pdf_string.encode('utf-8') Does not return a valid pdf binary (I tried to write it in a pdf file and then opened it and it was not the actual pdf, it was a blank pdf!)

furas Over a year ago

first you could use print() to see what you get in variables. You could also open PDF in text viewer/editor to see what you get in file. But if it opens it without any errors then it is correct PDF but without content. In first example I don't add any pages/text/images so it create empty PDF.

furas Over a year ago

BTW: after installing fpdf2 I don't have access to original fpdf (because both use the same names for modules) and I can't test this problem. Besides fpdf2 seems more useful so I would forget fpdf

furas Over a year ago

I added code which writes PDF using three methods but I tested it only with fpdf2

Andoni Over a year ago

I will have a look at fpdf2 (I did all my tests with fpdf). Really appreciate your help here. As soon as I test it I will write here what I saw. Thank you!

Collectives™ on Stack Overflow

PDF data stream with python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related