0

I have been trying to write a tool to look at Word documents in a specified path for broken links.

I gave up on having it search a folder, thinking I just need to get it to do a document first. With my limited skills, reading, and trying some suggestions from copilot. (Breaking it down to individual tasks) I put this together:

import docx
import os
import url

doc = docx.Document('C:/Users/demo.docx')
allText = []

def find_hyperlinks(doc):
    hyperlinks = []
    rels = doc.part.rels
    for rel in rels:
        if "hyperlink" in rels[rel].target_ref:
            hyperlinks.append(rels[rel].target_ref)
    return hyperlinks

def find_broken_links_in_docx(doc):
    broken_links = []
    hyperlinks = find_hyperlinks(doc)
    for url in hyperlinks:
        try:
            response = requests.head(url, allow_redirects=True)
            if response.status_code >= 400:
                broken_links.append(url)
        except requests.RequestException:
            broken_links.append(url)
    return broken_links

def write_report(report, output_file):
    with open(output_file, 'w') as f:
        for file_path, links in report.items():
            f.write(f"File: {file_path}\n")
            for link in links:
                f.write(f"  Broken link: {link}\n")
            f.write("\n")

if __name__ == "__main__":
    
    output_file = "C:/Results/broken_links_report.txt"
    report = find_broken_links_in_docx(doc)
    write_report(report, output_file)
    print(f"Report written to {output_file}")

Here is the error:

 File "c:\Users\Scripts\playground\openinganddocx.py", line 41, in <module>
    write_report(report, output_file)
  File "c:\Users\Scripts\playground\openinganddocx.py", line 31, in write_report
    for file_path, links in report.items():
    AttributeError: 'list' object has no attribute 'items'

For reference:

  • Line 31 f.write(f"File: {file_path}\n")

  • Line 41 print(f"Report written to {output_file}")

4
  • Can you include the entire traceback? There's critical error information missing Commented Mar 6 at 21:35
  • report is a list which does not support .items(). I'm assuming the error message says AttributeError: 'list' object has no attribute 'items' Commented Mar 6 at 21:38
  • It does! Copy paste error on my part. I am not good at this at all. I am pretty sure you are right. I am 100% sure a list would be generated by the find broken hyperlinks to add to the output file. But I am not sure how to capture that. I tried searching and asking copilot, and I am at a loss. I am trying to punch way above my weight class here. Commented Mar 6 at 21:45
  • We all start somewhere. It looks like what you really want is a dictionary of type {str: list}. Tackle the problem with that in mind. The keys to your dict are filepaths, the values are the urls per file. Don't worry about writing the file just yet, parse the links out and see if you can fill out your data structure correctly. I'll add an answer, but I won't solve the whole problem Commented Mar 6 at 21:49

1 Answer 1

0

The issue is you are treating a list (report) like a dict, which it is not. That's why you are getting an AttributeError: list has no attribute 'items'.

What you want is a dictionary that has the structure {'filepath': [<urls>]}. So start there:

def find_hyperlinks(doc_path: str):
    doc = docx.Document(doc_path)
    hyperlinks = []
    rels = doc.part.rels
    for rel in rels:
        if "hyperlink" in rels[rel].target_ref:
            hyperlinks.append(rels[rel].target_ref)

    # here is an example of how I might return that value
    return {doc_path: hyperlinks}

# from here, prune the hyperlinks that work
broken_links = {}

for doc_path, links in links_dict.items():
    broken = []
    for link in links:
        if link_works(link):
           continue

        broken.append(link)
    broken_links[doc_path] = broken

# etc

This isn't perfect, but will get you on the path to success

Sign up to request clarification or add additional context in comments.

2 Comments

Well thanks to you I got it to give me a report. However, the text in the file is "File: <function items at 0x00000110B01CB880>" I suspect it is how I defined items here: ``` def items(): items=broken links_dict = {items:[]} broken_links = {} ```
the output line is : ``` output_file = "C:/Results/broken_links_report.txt" report = find_hyperlinks(doc_path) write_report(report, output_file) print(f"Report written to {output_file}") ``` Everything else is the same. Again, thank you for getting me this far!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.