3

I try converting multiple html file into pdf using pdfkik. This is my code:

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

This code give the following error:

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

2 Answers 2

1

The solution i found was to first merge the html files into one and then go on to convert it using pdfkit. so in your case would be to save the tophtml and html files together in same dir and replace the path to that dir.

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
    """ converts multiple html files to a single pdf
    args: path to directory containing html files
    """
    empty_html = '<html><head></head><body></body></html>'
    for file in os.listdir(path):
        if file.endswith(".html"):
            print(file)
            # append html files
            with open(path + file, 'r') as f:
                html = f.read()
                empty_html = empty_html.replace('</body></html>', html + '</body></html>')
    # save merged html
    with open('merged.html', 'w') as f:
        f.write(empty_html)
    pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)
Sign up to request clarification or add additional context in comments.

2 Comments

I use this - Thank you. How do you get a Page break between each of the HTMLS when adding to PDF ? Currently its just pushing them all together
template_html = '<html><head></head><body>{BODY}</body></html>' empty_html = template_html.replace('{BODY}', html) This is a bit cleaner
0

I had the same error. The error you are probably getting is due to the inconsistency of your qt installation and non availability of compatible qt version. Try running

wkhtmltopdf

on your terminal and see whether you can find "Reduced Functionality".

If yes then my assumption is correct and then your safest bet would be to compile it from source.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.