Passing string variable value in Pandas dataframe

Question

I have been trying to use variables for passing the string value in dataframe for various column operations, but the code is giving me wrong results. See the code below, I am using in Jupyter Notebook:

first_key = input("key 1: ")
second_key = input("ket 2: ")
third_key = input("ket 2: ")

These receive the values "Russia", "China", "Trump" for the operation in next cell as below:

tweets['{first_key}'] = tweets['text'].str.contains(r"^(?=.*\b{first_key}\b).*$", case=False) == True
tweets['{second_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{second_key}'\b).*$", case=False) == True
tweets['{third_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{third_key}'\b).*$", case=False) == True

But results are wrong. Any idea how to get the correct results. A small snapshot of the results is like this.

Perhaps you wanted to leverage python f-strings but forgot the "f" at the beginning. — cs95
– cs95, Commented May 1, 2018 at 1:55
OK, just figured out the 'f' thing it works for the column head, but how to pass the same for regex. This is needed now. — ambrish dhaka
– ambrish dhaka, Commented May 1, 2018 at 1:56

cs95 · Accepted Answer · 2018-05-01 01:58:51Z

1

I've tried cleaning up your code. You can leverage f-strings (using python-3.6+) with a tiny change to your code:

def contains(series, key):
    return series.str.contains(rf"^(?=.*\b{key}\b).*$", case=False)

If you're working with an older version of python, use str.format:

def contains(series, key):
    return series.str.contains(r"^(?=.*\b{}\b).*$".format(key), case=False)

Next, call this function inside a loop:

for key in (first_key, second_key, third_key):
    tweets[key] = contains(tweets['text'], key)

answered May 1, 2018 at 1:58

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ambrish dhaka Over a year ago

Thanks! Just for the sake of learning, I wanted to know how to pass the variable in the regex. thanks!

cs95 Over a year ago

@ambrishdhaka Just like this: tweets[key] = tweets['text'].str.contains(rf"^(?=.*\b{key}\b).*$", case=False)

ambrish dhaka Over a year ago

There seems to be an issue, the code case=False is used to make the search query case insensitive. And, the condition == is then used to check if the value is present. But, the result are all False when using tweets[f'{first_key}'] = tweets['text'].str.contains(r"^(?=.*\b{first_key}\b).*$", case=False) == True

cs95 Over a year ago

@ambrishdhaka The == True is redundant since contains returns a mask anyway. Also, it should be rf"^(?=.*\b{first_key}\b).*$", you're missing a leading f, please look at it again. Also, if the answer helps, do consider passing an upvote, thanks.

ambrish dhaka Over a year ago

You are right. It is the same 'f' thing. Now got the correct results. And, also thanks == True is not needed. I learnt that.

Collectives™ on Stack Overflow

Passing string variable value in Pandas dataframe

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related