0

How can I go about stripping the tags off this list:

['</span>A walk in the park<span class="html-tag"]

I managed to use (r'(?<=</span>)[^>]+') to remove the first tag but cant figure out how to remove the second. I know regular expressions ain't the way to go for dealing with tags but just want to figure this out.

4
  • 4
    Out of curiosity - how are you getting that string in the first place...? Almost seems like you might want to be extracting text differently earlier in some processing rather than tidying up that... Commented Oct 15, 2017 at 14:35
  • @JonClements I just created the above to mirror some issue i was having on a more complex task which would have been hard to explain. Commented Oct 15, 2017 at 15:16
  • sure - just seems you're trying to clear up things where it can possibly be avoided is all... Commented Oct 15, 2017 at 15:17
  • I know hey. Its part of a uni assignment forbidding the use of any modules other than the 're' module for web scraping. Quite silly i thought Commented Oct 15, 2017 at 16:54

2 Answers 2

1

You were quite close with your regex. After the position found by the lookbehind, you just want to read up to the next <:

(?<=</span>)[^<]+

Check it out on regex101

$ cat test.py
import re
s='</span>A walk in the park<span class="html-tag"'
print re.findall(r'(?<=</span>)[^<]+', s)

$ python test.py
['A walk in the park']
Sign up to request clarification or add additional context in comments.

Comments

1

You can use:

(?:>)(.*)(?:<)

In regex, every opened and closed round brakets defines a group. Here, we have 3 couples of rounded brackets but the first and the last one have a ?: inside. That means that the group being defined is a non-capturing group so it is needed to match the pattern but it will not be returned by the parser. Instead, what you want is in group #1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.