6

I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.

4 Answers 4

4

Docx files can be unzipped for extracting the images.

Sign up to request clarification or add additional context in comments.

Comments

3

Find some inspiration in this post How can I search a word in a Word 2007 .docx file?

Comments

2

You can use the python module docx2txt for extracting text as well as images from docx files

Comments

-2
document =docx.Document(filepath)
for image in document.inline_shapes:
    print (image.width, image.height)

Try this it will work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.