1

I have been experimenting with Python's Regex Module: Re.

I decided to write a simple expression that searches for links (href="url") in a file.

Here is my Regex: href *= *(\"|\').*\1

When I used a site called GSkinner, I decided to try out my expression. The results are here, along with the code.

When I decided to try it out on python regex, I used the following code:

lines = """Code found in link"""
results = re.findall(r"href *= *(\"|\').*\1", lines)
print results # Ouputs: ['"', '"'] instead of two provided links

Why are the results outputting in empty strings?

1 Answer 1

1

findall will only return what is captured (unless nothing is captured). You have to capture the value you want as well:

r"href *= *(\"|\')(.*?)\1

All together you may want to use something like:

results = [x[1] for x in re.findall(r"href *= *(\"|\')(.*?)\1", lines)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.