7
import mammoth

f = open("D:\filename.docx", 'rb')
document = mammoth.convert_to_html(f)

I am unable to get a .html file while i run this code,please help me to get it, When i converted to .html file i am not getting images inserted into word file into .html file,Can you please help me how to get images into .html from .docx?

2
  • 2
    You haven’t told us what error you’re experiencing. Commented Oct 30, 2017 at 5:49
  • I didn't got error but i was not getting .html file after converting Commented Oct 30, 2017 at 12:18

3 Answers 3

12

Try this:

import mammoth

f = open("path_to_file.docx", 'rb')
b = open('filename.html', 'wb')
document = mammoth.convert_to_html(f)
b.write(document.value.encode('utf8'))
f.close()
b.close()
Sign up to request clarification or add additional context in comments.

1 Comment

unable to convert docx with chart ! what to do ?
4

This is may be late to answer this question but just incase if someone still looking for the answer where word "tables/images/" should remains same after conversion to html below answer would help.

import win32com.client as win32
# Open MS Word
word = win32.gencache.EnsureDispatch('Word.Application')

wordFilePath = "C:\filename.docx"

doc = word.Documents.Open(wordFilePath)
# change to a .html
txt_path = wordFilePath.split('.')[0] + '.html'

# wdFormatFilteredHTML has value 10
# saves the doc as an html
doc.SaveAs(txt_path, 10)

doc.Close()
# noinspection PyBroadException
try:
    word.ActiveDocument()
except Exception:
    word.Quit()

4 Comments

Hi snehil singh, Is it possible to convert to HTML but also keeping any images embedded into the HTML file instead of being linked with an external folder?
@sugarakis Did tried above code ? this code will keep your images in HTML file. Please post a question with your detailed query and challenges you are facing provide me a link
Hi snehil singh, the above code creates the HTML with all its format and images, but it also creates a separate folder with the same name as the DOCX file, where the images are located. If I rename the HTML file or delete the extra folder, any images in the HTML file are not shown. I would like a solution where the HTML file by itself can display any images as well. Thank you
Hi @sugarakis you can use a Python library like BeautifulSoup to modify the image references in the HTML file to embed the images directly in the file. If you need complete code Please post a question and provide me a link
2

I suggest you to try the following code

    import mammoth
    with open("document.docx", "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file)
    html = result.value

1 Comment

But i am unable to convert the images from .docx file to .html file,Can you please help me how to do that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.