3

How would I go about writing a function (with BeautifulSoup or otherwise) that would replace all instances of one HTML tag with another. For example:

text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text)  # "<p>this is some text<p><p class='foo' data-foo='bar'> with some tags</p><span>that I would</span><p>like to replace</p>"

I tried this, but preserving the attributes of each tag is a challenge:

def replace_tags(string, old_tag, new_tag):
  soup = BeautifulSoup(string, "html.parser")
  nodes = soup.findAll(old_tag)
  for node in nodes:
      new_content = BeautifulSoup("<{0}>{1}</{0}".format(
          new_tag, node.contents[0],
      ))  
      node.replaceWith(new_content)                                                
  string = soup.body.contents[0]
  return string

Any idea how I could just replace the tag element itself in the soup? Or, even better, does anyone know of a library/utility function that'll handle this more robustly than something I'd write?

Thank you!

2
  • Were you looking for something like this crummy.com/software/BeautifulSoup/bs3/…? Commented Mar 9, 2018 at 23:06
  • The "Here's a more complex example that replaces one tag with another:" example is very close, but I'm looking to preserve tag attributes. I imagine that will require flying through each element in a forloop, but I'm not sure how to get the attr list from the tag I'm replacing Commented Mar 9, 2018 at 23:29

1 Answer 1

4

Actually it's pretty simple. You can directly use old_tag.name = new_tag.

def replace_tags(string, old_tag, new_tag):
    soup = BeautifulSoup(string, "html.parser")
    for node in soup.findAll(old_tag):
        node.name = new_tag
    return soup  # or return str(soup) if you want a string.

text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text)

Output:

<p>this is some text<p><p class="foo" data-foo="bar"> with some tags</p><span>that I would</span><p>like to replace</p></p></p>

From the documentation:

Every tag has a name, accessible as .name:

tag.name
# u'b' 

If you change a tag’s name, the change will be reflected in any HTML markup generated by Beautiful Soup:

tag.name = "blockquote" 
tag
# <blockquote class="boldest">Extremely bold</blockquote>
Sign up to request clarification or add additional context in comments.

1 Comment

I did NOT know that you can change the name attribute that way. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.