1

I have a data frame like this:

text,                pos
No thank you.        [(No, DT), (thank, NN), (you, PRP)]
They didn't respond  [(They, PRP), (didn't, VBP), (respond, JJ)]

I want o apply a function on pos and save the result in a new column. So the output would look like this:

text,                pos                                           score
No thank you.        [(No, DT), (thank, NN), (you, PRP)]        [[0.0, 0.0, 1.0], [], [0.5, 0.0, 0.45]]
They didn't respond  [(They, PRP), (didn, VBP), (respond, JJ)]  [[0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]

So the function return a list for each tuple in the list (but the implementation of the function is not the point here, for that I just call get_sentiment). I can do it using the nested loop but I didn't like it. I want to do it using a more pythonic and Pandas Dataframe way:

This is what I have tried so far:

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x,y) for j in k for (x,y) in j])

However, it raises this error:

ValueError: too many values to unpack (expected 2)

There is a couple of question in so but the answers was in R.

for more clarity:

get_sentiment function is a function in NLTK that assigns a list of score to each word (The list is [positive score, negative score, objectivity score]). Overall, I need to apply that function on top of the pos column of my Dataframe.

4
  • What is the reference or references for the values in those list? for instance, how do you determine that (No, DT) converts to [0.0, 0.0, 1.0]? Commented Sep 14, 2021 at 23:15
  • @ashkangh I have called get_sentiment function of NLTK. So what it does, it assign a score to each word. But I don't need that part as the function already works perfectly. I don't know how to apply that function on top of pos column in my Dataframe. Commented Sep 14, 2021 at 23:18
  • @ashkangh The list is [positive score, negative score, objectivity score]. This is the list assign to each tuple. But I feel we don't need this detail as my problem is more general, and is how to incorporate two nested loop and apply. Commented Sep 14, 2021 at 23:22
  • 1
    Assuming the get_sentiment function does exactly what you say - takes a tuple as it's only argument and returns a list of scores, maybe something like this would work? df['score'] = df['pos'].apply(lambda k: [get_sentiment(j) for j in k]) Commented Sep 14, 2021 at 23:55

2 Answers 2

2

In your case

df['score'] = df['pos'].apply(lambda k: [get_sentiment(j[0],j[1]) for j in k ])
Sign up to request clarification or add additional context in comments.

Comments

2

Let's take Pandas out of the equation and create a minimal reproducible example of the problem - which is to do with the lambda itself:

def mock_sentiment(word, pos):
    return len(word) * 0.1, 0, len(pos) * 0.1

data = [('No', 'DT'), ('thank', 'NN'), ('you', 'PRP')]

[mock_sentiment(x, y) for j in data for (x,y) in j] # reproduces the error

The problem is that each j in data (e.g. ('No', 'DT')) is a single tuple that we want to unpack into x, y values. By iterating in j, we get individual strings ('No' and 'DT') which we then attempt to unpack into x and y. This happens to work for 'No' and 'DT', but not for strings of other lengths - and even then, it's not the desired result.

Since j is already the tuple that we want to unpack, what we want to do is unpack it there, by using (x, y) rather than j for the iteration, and not have any nested comprehension:

[mock_sentiment(x, y) for (x, y) in data] # works as expected

Consequently, that is what we want the lambda to give back to Pandas in the real code (substituting back in your names and the real sentiment function):

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x, y) for (x, y) in k])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.