Linking HTML files in a folder with one another using python

Question

I have a folder full of html files as follows:

aaa.html
bbb.html
ccc.html
....
......
.........
zzz.html

All these htmls are created using a python script, and hence follow the same template.

Now, I want to link all these html files, for which I already have the placeholders in the html as follows:

<nav>
    <ul class="pager">
        <li class="previous"><a href="#">Previous</a></li>
        <li class="next"><a href="#">Next</a></li>
    </ul>
</nav>

I want to fill these placeholders with the filenames in the folder. For example, bbb.html will have

<nav>
    <ul class="pager">
        <li class="previous"><a href="aaa.html">Previous</a></li>
        <li class="next"><a href="ccc.html">Next</a></li>
    </ul>
</nav>

and the ccc.html file will contain:

<nav>
    <ul class="pager">
        <li class="previous"><a href="bbb.html">Previous</a></li>
        <li class="next"><a href="ddd.html">Next</a></li>
    </ul>
</nav>

And so on for rest of the files. Can this task be done using python? I don't even know how to start with. Any hints, suggestions would be really helpful.

is the order of the html files truly alphabetic? If you have AAA.html and aaa.html, which comes first? — philshem
– philshem, Commented Apr 7, 2017 at 8:20
You can use os.walk to list of files in that directory, sort them with custom sorting function that you use for template in web scraping then iterate over that list read each file with beautiful soup to change those 2 placeholders to previous and next elementes on list. — Tomasz Plaskota
– Tomasz Plaskota, Commented Apr 7, 2017 at 8:23
@philshem The order really doesn't matter. It is just that one file has to be linked with other two. So, any order would do. — kingmakerking
– kingmakerking, Commented Apr 7, 2017 at 8:27

TrakJohnson · Accepted Answer · 2017-04-07 08:48:35Z

2

You can use the beautifulsoup library to modify html:

from bs4 import BeautifulSoup

file_names = ['bbb.html', 'ccc.html', ... , 'yyy.html']
# we exclude first and last files (not sure what to do with them ?)

for ind, file_name in enumerate(file_names):
    with open(file_name, 'r+') as f:
        soup = BeautifulSoup(f.read(), 'html.parser')
        # we suppose that there is only one link for previous and next
        soup.find_all(class_='previous')[0]['href'] = file_names[ind - 1]
        soup.find_all(class_='next')[0]['href'] = file_names[ind + 1]
        # erase contents and replace with new html
        f.seek(0)
        f.truncate()
        f.write(soup.prettify("utf-8"))  # to get readable HTML

If the filenames aren't as consistent as in your example, and you want to generate the list from the files in the directory, you can use os.walk or glob.glob.

edited Apr 7, 2017 at 8:48

answered Apr 7, 2017 at 8:27

TrakJohnson

2,1952 gold badges23 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:17:50Z

You can replace elements from your template by looping over the file list, with list wrapping. Here's an example for aaa.html using aaa,bbb,ccc:

#f = ['aaa.html','bbb.html','ccc.html']
f = sorted(['aaa.html','bbb.html','ccc.html'])  # explicit sorting

t = """<nav>
    <ul class="pager">
        <li class="previous"><a href="#">Previous</a></li>
        <li class="next"><a href="#">Next</a></li>
    </ul>
</nav>"""  # sample aaa.html file

for i in xrange(len(f)-1):
    #print f[i]
    t = t.replace('<li class="previous"><a href="#">Previous','<li class="previous"><a href="'+f[(i % len(f)) -1]+'">Previous')
    t = t.replace('<li class="next"><a href="#">Next','<li class="next"><a href="'+f[(i % len(f)) +1]+'">Next')

print t

To do the list-wrapping I use this concept (After zzz comes aaa)

Gives as an output for aaa.html:

<nav>
    <ul class="pager">
        <li class="previous"><a href="ccc.html">Previous</a></li>
        <li class="next"><a href="bbb.html">Next</a></li>
    </ul>
</nav>

To complete the code, you'd have to loop over *.html files (see glob.glob)

Collectives™ on Stack Overflow

Linking HTML files in a folder with one another using python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related