0

I'm trying to search for strings within lists that are contained in a pandas dataframe, see this one example:

       userAuthor     hashtagsMessage
post_1    nytimes            [#Emmys]
post_2        TMZ                  []
post_3     Forbes        [#BTSatUNGA]
post_4    nytimes            [#Emmys]
post_5     Forbes  [#BTS, #BTSatUNGA]

As you have noticed, the column that hosts such lists is 'hashtagsMessage'. I've tried using conventional methods for string searching but I've not been able to.

If I wanted to look for an exact match for '#BTS', with a conventional method, you could use some of these options, like:

df['hashtagsMessage'].str.contains("#BTS", case=False)

or

df['hashtagsMessage']=="#BTS" 

Or similar. Unfortunately, these approaches do not work for lists, I have to make an extra step I suppose to index inside the list while I'm searching in the DataFrame but I'm not really sure how to do this part.

Any help is entirely appreciated!

2
  • 1
    Do you search for the tag '#BTS' or a partial tag and match '#BTSatUNGA'? Commented Nov 25, 2021 at 13:18
  • @Corralien hi, forgot to add. Exact matches. Question was edited, thank you! Commented Nov 25, 2021 at 13:20

3 Answers 3

2

Use map or apply:

>>> df['hashtagsMessage'].map(lambda x: '#BTS' in x)

post_1    False
post_2    False
post_3    False
post_4    False
post_5     True
Name: hashtagsMessage, dtype: bool

Update

A more vectorizable way using explode:

>>> df.loc[df['hashtagsMessage'].explode().eq('#BTS').loc[lambda x: x].index]

       userAuthor     hashtagsMessage
post_5     Forbes  [#BTS, #BTSatUNGA]
Sign up to request clarification or add additional context in comments.

1 Comment

I search for the tag '#BTS' and not a partial match.
1

Please search for raw string

if not actual list use:

df['hashtagsMessage'].str.contains(r'#BTS')

if list please use

df['hashtagsMessage'].astype(str).str.contains(r'#BTS')

Comments

1

You could use a simple anonymous function employing a list-comprehension and any() e.g.:

Edit: I originally presumed you wanted any tag containing '#BTS', and just edited to find only exact match(es) :)

In [10]: df = pd.DataFrame({'hashtagsMessage':[
                            [], ["#BTSatUNGA"],
                            ["#Emmys"], ['#BTS', '#BTSatUNGA']]})

In [18]: df['hashtagsMessage'].apply(lambda lst: any(s for s in lst
                                                     if s == "#BTS"))
Out[18]: 
0    False
1    False
2    False
3     True
Name: hashtagsMessage, dtype: bool

2 Comments

Thanks for this, I forgot to be more clear in the original question, but your approach I also liked very much because it will have other uses cases (similar actually) where matching doesn't need to be exact.
@AquilesPáez: ok! happy coding to you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.