Converting Multiple html file into pdf using pdfkit in Python

Question

I try converting multiple html file into pdf using pdfkik. This is my code:

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

This code give the following error:

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

Azaria Gebremichael · Accepted Answer · 2021-12-09 09:10:52Z

1

The solution i found was to first merge the html files into one and then go on to convert it using pdfkit. so in your case would be to save the tophtml and html files together in same dir and replace the path to that dir.

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
    """ converts multiple html files to a single pdf
    args: path to directory containing html files
    """
    empty_html = '<html><head></head><body></body></html>'
    for file in os.listdir(path):
        if file.endswith(".html"):
            print(file)
            # append html files
            with open(path + file, 'r') as f:
                html = f.read()
                empty_html = empty_html.replace('</body></html>', html + '</body></html>')
    # save merged html
    with open('merged.html', 'w') as f:
        f.write(empty_html)
    pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)

answered Dec 9, 2021 at 9:10

Azaria Gebremichael

7821 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

grahamie Over a year ago

I use this - Thank you. How do you get a Page break between each of the HTMLS when adding to PDF ? Currently its just pushing them all together

Daniel Bardi Over a year ago

template_html = '<html><head></head><body>{BODY}</body></html>' empty_html = template_html.replace('{BODY}', html) This is a bit cleaner

Open Trap · Accepted Answer · 2017-12-13 14:27:08Z

0

I had the same error. The error you are probably getting is due to the inconsistency of your qt installation and non availability of compatible qt version. Try running

wkhtmltopdf

on your terminal and see whether you can find "Reduced Functionality".

If yes then my assumption is correct and then your safest bet would be to compile it from source.

answered Dec 13, 2017 at 14:27

Open Trap

12 bronze badges

Collectives™ on Stack Overflow

Converting Multiple html file into pdf using pdfkit in Python

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related