gpt4 book ai didi

python - 在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf

转载 作者:行者123 更新时间:2023-12-05 06:37:38 27 4
gpt4 key购买 nike

我尝试使用 pdfkik 将多个 html 文件转换为 pdf。这是我的代码:

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

此代码给出以下错误:

Traceback (most recent call last):
File "lkdndata.py", line 23, in <module>
pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
return r.to_pdf(output_path)
File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

最佳答案

我找到的解决方案是先将 html 文件合并为一个文件,然后继续使用 pdfkit 将其转换。所以在你的情况下是将 tophtml 和 html 文件一起保存在同一个目录中并替换该目录的路径。

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
""" converts multiple html files to a single pdf
args: path to directory containing html files
"""
empty_html = '<html><head></head><body></body></html>'
for file in os.listdir(path):
if file.endswith(".html"):
print(file)
# append html files
with open(path + file, 'r') as f:
html = f.read()
empty_html = empty_html.replace('</body></html>', html + '</body></html>')
# save merged html
with open('merged.html', 'w') as f:
f.write(empty_html)
pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)

关于python - 在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47328475/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com