How to solve ValueError when testing truth value of Dataframe contents? Python

Question

I have a Dataframe that looks like this.

   done    sentence                        3_tags
0  0       ['What', 'were', 'the', '...]   ['WP', 'VBD', 'DT']
1  0       ['What', 'was', 'the', '...]    ['WP', 'VBD', 'DT']
2  0       ['Why', 'did', 'John', '...]    ['WP', 'VBD', 'NN']
...

For each row I want to check if the list in column '3_tags' is on a list temp1, as follows:

a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags'] 
q in temp1

For the first sentence in row 0, the value of '3_tags' = ['WP', 'VBD', 'DT'] which is in temp1 so I expect the result of the above to be:

True

However, I get this error:

ValueError: Arrays were different lengths: 1 vs 3

I suspect that there is some problem with the datatype of q:

print(type(q))
<class 'pandas.core.series.Series'>

Is the problem that q is a Series and temp1 contains lists? What should I do to get the logical result 'True' ?

piRSquared · Accepted Answer · 2018-03-30 18:07:13Z

1

You want those lists to be tuples instead.
Then use pd.Series.isin

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(tuple)

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

However, it appears that the '3_tags' column consists of strings that look like lists. In this case, we want to parse them with ast.literal_eval

from ast import literal_eval

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

Setup1

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

Setup2

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

edited Mar 30, 2018 at 18:07

answered Mar 30, 2018 at 15:10

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

twhale Over a year ago

In the Setup, I do not understand how to prepare my (very large) DataFrame the way you show. How do I convert it to tuples?

piRSquared Over a year ago

The setup is to produce the variables a and temp1 as you had them. You shouldn't have to do anything. That is for others who may want to test it out. You just need to use the code in the top portion.

twhale Over a year ago

Thanks, got it. When I use the top portion it gives another error: ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

twhale Over a year ago

When I do q = a['3_tags'].apply(tuple) and then print(q), I get: ([, ', D, T, ', ,, , ', N, N, ', ,, , ', I, ...

piRSquared Over a year ago

That means your data frame is all messed up. In your post it looks like those elements in '3_tags' are lists when they are strings that look like lists. I'll update my post to account for that. In fact, if you are able, you should provide a method to reproduce exactly what your data is.

|

Collectives™ on Stack Overflow

How to solve ValueError when testing truth value of Dataframe contents? Python

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related