1

I have been trying to use variables for passing the string value in dataframe for various column operations, but the code is giving me wrong results. See the code below, I am using in Jupyter Notebook:

first_key = input("key 1: ")
second_key = input("ket 2: ")
third_key = input("ket 2: ")

These receive the values "Russia", "China", "Trump" for the operation in next cell as below:

tweets['{first_key}'] = tweets['text'].str.contains(r"^(?=.*\b{first_key}\b).*$", case=False) == True
tweets['{second_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{second_key}'\b).*$", case=False) == True
tweets['{third_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{third_key}'\b).*$", case=False) == True

But results are wrong. Any idea how to get the correct results. A small snapshot of the results is like this.

Output of the code run.

2
  • Perhaps you wanted to leverage python f-strings but forgot the "f" at the beginning. Commented May 1, 2018 at 1:55
  • OK, just figured out the 'f' thing it works for the column head, but how to pass the same for regex. This is needed now. Commented May 1, 2018 at 1:56

1 Answer 1

1

I've tried cleaning up your code. You can leverage f-strings (using python-3.6+) with a tiny change to your code:

def contains(series, key):
    return series.str.contains(rf"^(?=.*\b{key}\b).*$", case=False)

If you're working with an older version of python, use str.format:

def contains(series, key):
    return series.str.contains(r"^(?=.*\b{}\b).*$".format(key), case=False)    

Next, call this function inside a loop:

for key in (first_key, second_key, third_key):
    tweets[key] = contains(tweets['text'], key)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! Just for the sake of learning, I wanted to know how to pass the variable in the regex. thanks!
@ambrishdhaka Just like this: tweets[key] = tweets['text'].str.contains(rf"^(?=.*\b{key}\b).*$", case=False)
There seems to be an issue, the code case=False is used to make the search query case insensitive. And, the condition == is then used to check if the value is present. But, the result are all False when using tweets[f'{first_key}'] = tweets['text'].str.contains(r"^(?=.*\b{first_key}\b).*$", case=False) == True
@ambrishdhaka The == True is redundant since contains returns a mask anyway. Also, it should be rf"^(?=.*\b{first_key}\b).*$", you're missing a leading f, please look at it again. Also, if the answer helps, do consider passing an upvote, thanks.
You are right. It is the same 'f' thing. Now got the correct results. And, also thanks == True is not needed. I learnt that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.