3

I have dynamically-generated strings which contain HTML links. For example, the string might contain:

You can visit <a class="hyperlink" href="https://www.stackoverflow.com">Stack Overflow</a> or <a class="hyperlink" href="https://www.reddit.com">Reddit.com</a> for help, or you can visit <a href="hyperlink" href="https://www.google.com">https://www.google.com</a>

I'm trying to find all hrefs and instead generate new URLs for them, then insert them back into the string. So the example above, if run successfully, would look like:

You can visit <a class="hyperlink" href="https://mydomain.com/www.stackoverflow.com">Stack Overflow</a> or <a class="hyperlink" href="https://mydomain.com/reddit.com">Reddit.com</a> for help, or you can visit <a href="hyperlink" href="https://mydomain.com/www.google.com">https://www.google.com</a>

I'm very new to Python and haven't been successful at figuring out how to find an href and prepend its value with my "custom" domain. Any insight would be highly appreciated!

0

1 Answer 1

1

This is the perfect place to use a regular expression.

import re

text = 'You can visit <a class="hyperlink" href="https://www.stackoverflow.com">Stack Overflow</a>'
new_text = re.sub(r'href="http(s)?:\/\/(.+?)"', r'href="https://mydomain.com/\2"', text)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.