2

Let's assume I have some string like that:

x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 

So, I want to get from that :

:heavy_black_heart:
:smiling_face:

To do that I did the following :

import re
result = re.search(':(.*?):', x)
result.group()

It only gives me the ':heavy_black_heart:' . How could I make it work ? If possible I want to store them in dictonary after I found all of them.

4
  • Maybe set(re.findall(r':[^:]+:', x)) will do? Not sure what there might be between :, maybe r':\w+:' will work better. Commented Sep 14, 2017 at 12:13
  • @WiktorStribiżew for the example, it works, but I couldn't understand why you're not sure Commented Sep 14, 2017 at 12:20
  • See my answer with some explanations. Actually, you have not provided all the requirements, just two examples, that is why I said I was not sure. Commented Sep 14, 2017 at 12:23
  • Do you really want to match ::? As I said, you did not post exact specs. If you need to match any chars inside :...: that are not whitespaces, use :[^\s:]+: - see my updated answer. Commented Sep 14, 2017 at 12:48

5 Answers 5

3

print re.findall(':.*?:', x) is doing the job.

Output:
[':heavy_black_heart:', ':heavy_black_heart:', ':smiling_face:']

But if you want to remove the duplicates:

Use:

res = re.findall(':.*?:', x)
dictt = {x for x in res}
print list(dictt)

Output:
[':heavy_black_heart:', ':smiling_face:']

Sign up to request clarification or add additional context in comments.

7 Comments

re.MULTILINE is not doing anything with the pattern since there are no ^ and $ to modify the behavior of. re.match only searches for a match at the beginning of the string.
Now, you do not have : in the matches.
Check now @WiktorStribiżew
You do not need any capturing group, remove ( and ). It will still match :: (not sure it is expected).
Thanks for pointing out . Capturing parentheses are removed. No , it won't match ::
|
2

You seem to want to match smilies that are some symbols in-between 2 :s. The .*? can match 0 symbols, and your regex can match ::, which I think is not what you would want to get. Besdies, re.search only returns one - the first - match, and to get multiple matches, you usually use re.findall or re.finditer.

I think you need

set(re.findall(r':[^:]+:', x))

or if you only need to match word chars inside :...::

set(re.findall(r':\w+:', x))

or - if you want to match any non-whitespace chars in between two ::

set(re.findall(r':[^\s:]+:', x))

The re.findall will find all non-overlapping occurrences and set will remove dupes.

The patterns will match :, then 1+ chars other than : ([^:]+) (or 1 or more letters, digits and _) and again :.

>>> import re
>>> x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> print(set(re.findall(r':[^:]+:', x)))
{':smiling_face:', ':heavy_black_heart:'}
>>> 

Comments

0

try this regex:

:([a-z0-9:A-Z_]+):

2 Comments

When I try it, it produces ':heavy_black_heart::heavy_black_heart:' which isn't what I want
@zwlayer It returns that match because : is inside the character class and + is a greedy quantifier, so all the chars defined in the character class are matched first, as many as possible occurrences, up to the last : that occurs after _, letters and digits.
0
import re
x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 
print set(re.findall(':.*?:', x))

output:

{':heavy_black_heart:', ':smiling_face:'}

Comments

0

Just for fun, here's a simple solution without regex. It splits around ':' and keeps the elements with odd index:

>>> text = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> text.split(':')[1::2]
['heavy_black_heart', 'heavy_black_heart', 'smiling_face']
>>> set(text.split(':')[1::2])
set(['heavy_black_heart', 'smiling_face'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.