0

I got a long string and i need to find words which contain the character 'd' and afterwards the character 'e'.

l=[" xkn59438","yhdck2","eihd39d9","chdsye847","hedle3455","xjhd53e","45da","de37dp"]
b=' '.join(l)
runs1=re.findall(r"\b\w?d.*e\w?\b",b)
print(runs1)

\b is the boundary of the word, which follows with any char (\w?) and etc. I get an empty list.

7
  • Your strings are only words, why do you need a word boundary anyway? Commented Jun 11, 2018 at 16:17
  • Why not apply a regex to each word in the list individually? Why join them into a massive string? Commented Jun 11, 2018 at 16:18
  • @Aran-Fey Doing re.search i successfully done by making a for loop, just trying to understand how to use re.findall. Commented Jun 11, 2018 at 16:26
  • Why are you joining it? This makes your solution that much more complicated, and searching one big strings with a complex regex may end up being worse than searching smaller strings with a simpler regex. Commented Jun 11, 2018 at 16:30
  • @coldspeed Yeah i know, i just wanted to understand how to use re.findall, and couldn't grasp why my expression doesn't work. I have already done the same with a for loop for the smaller expression by re.search. Commented Jun 11, 2018 at 16:32

3 Answers 3

1

You can massively simplify your solution by applying a regex based search on each string individually.

>>> p = re.compile('d.*e')
>>> list(filter(p.search, l))

Or,

>>> [x for x in l if p.search(x)]

['chdsye847', 'hedle3455', 'xjhd53e', 'de37dp']

Why didn't re.findall work? You were searching one large string, and your greedy match in the middle was searching across strings. The fix would've been

>>> re.findall(r"\b\S*d\S*e\S*", ' '.join(l))
['chdsye847', 'hedle3455', 'xjhd53e', 'de37dp']

Using \S to match anything that is not a space.

Sign up to request clarification or add additional context in comments.

Comments

0

You can filter the result :

import re
l=[" xkn59438","yhdck2","eihd39d9","chdsye847","hedle3455","xjhd53e","45da","de37dp"]

pattern = r'd.*?e'

print(list(filter(lambda x:re.search(pattern,x),l)))

output:

['chdsye847', 'hedle3455', 'xjhd53e', 'de37dp']

2 Comments

This is essentially the same as this, but uglier and less efficient. Having a non-greedy capture group makes no difference on strings this small.
Thank you for comment bro, I just tried my approach because i wanted to help, But if you don't like I can delete :)
0

Something like this maybe

\b\w*d\w*e\w*

Note that you can probably remove the word boundary here because
the first \w guarantees a word boundary before.

The same \w*d\w*e\w*

5 Comments

Thank you, can you explain please why my expression didn't fulfill the same?
@david007killer Because it was wrong? And why are you insisting on joining the strings? Is there some secret requirement here that you've decided not to mention inyour answer?
Your regex has this part .* which will match non-words and is also greedy. Where as this regex will limit the chars to words only. This would be considered a pure answer unencumbered by whether it's a string or a variable.
@coldspeed I have answered your question above, and i'm sorry i didn't clarify properly my intentions .
@sln Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.