How to modify an html tree in python?

Question

Suppose there is some variable fragment html code

<p>
    <span class="code"> string 1 </ span>
    <span class="code"> string 2 </ span>
    <span class="code"> string 3 </ span>
</ p>
<p>
    <span class="any"> Some text </ span>
</ p>

I need to modify the contents of all the tags with the class code  skipping content through some function, such as foo, which returns the contents of the modified tag . Ultimately, I should get a new piece of html document like this:

<p>
    <span class="code"> modify string 1 </ span>
    <span class="code"> modify string 2 </ span>
    <span class="code"> modify string 3 </ span>
</ p>
<p>
    <span class="any"> Some text </ span>
</ p>

I have been suggested that the search for the specific html nodes can be easy using the python library BeautifulSoup4. How to perform a modification of content  and save a new version as a new file ? I guess to find you need to use soup.find_all ('span', class = re.compile ("code")), only this function returns a list ( copy) of the sample objects , modification of which does not change the contents of soup. How do I solve this problem?

Blender · Accepted Answer · 2014-01-05 18:59:59Z

4

 is invalid HTML and not even a web browser's lenient parser will parse it properly.

Once you fix your HTML, you can use .replaceWith():

from bs4 import BeautifulSoup

soup = BeautifulSoup('''
    <p>
        <span class="code"> string 1 </span>
        <span class="code"> string 2 </span>
        <span class="code"> string 3 </span>
    </p>
    <p>
        <span class="any"> Some text </span>
    </p>
''', 'html5lib')

for span in soup.find_all('span', class_='code'):
    span.string.replaceWith('modified ' + span.string)

answered Jan 5, 2014 at 18:59

Blender

300k55 gold badges462 silver badges511 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

shad0w_wa1k3r Over a year ago

Umm, checked with BeautifoulSoup4, it does parse  properly! But it does mess up .

Blender Over a year ago

@AshishNitinPatil: The s get nested inside one another.

Collectives™ on Stack Overflow

How to modify an html tree in python?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related