3

I am looking for a solution to the following problem. There's a DataFrame:

data = np.array([['', 'col1', 'col2'],
                ['row1', 1, 2],
                ['row2', 3, 4]])
df = pd.DataFrame(data=data[1:,1:], index=data[1:,0],columns=data[0,1:])

I wish to retain rows in which, for example, value in column col1 belongs to a list [1, 2] while value in column col2 belongs to a list [2, 4]. This is what I thought would work

df1 = df[df['col1'].isin([1,2]) & df['col2'].isin([2,4])]

However df1 prints as an Empty DataFrame. On the other hand, this approach

df1 = df[(df.col1 in [1,2]) & (df.col2 in [2,4])]

results in

ValueError: The truth value of a Series is ambiguous. Use a.empty, `a.bool()`, `a.item()`, `a.any()` or `a.all()`.

It would be expected to get a DataFrame with row1 in it. Needless to say I am relatively new to Python. Thanks a lot for your help.

1
  • 1
    Your problem is that the columns are of object dtype (because you initially had a NumPy array mixing strings and integers). For instance df = df.astype(int) and your query will work. Commented Jun 16, 2018 at 18:03

2 Answers 2

4

You need to convert numeric series to numeric types:

df = pd.DataFrame(data=data[1:,1:].astype(int),
                  index=data[1:,0],
                  columns=data[0,1:])

df1 = df[df['col1'].isin([1,2]) & df['col2'].isin([2,4])]

print(df1)

      col1  col2
row1     1     2

Your code does not work because your initial data array is of type object, representing pointers to arbitrary types. Pandas does not apply conversion implicitly as this would be prohibitively expensive in most situations.

If you already have a constructed Pandas dataframe, you can apply numeric conversion as a separate step:

df = df.astype(int)

Or, to convert only specified series:

cols = ['col1', 'col2']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
Sign up to request clarification or add additional context in comments.

Comments

2

Your colunm type is object , since you create the data by using np.array , np.array only allow single dtype in each array

df.applymap(type)
Out[139]: 
               col1           col2
row1  <class 'str'>  <class 'str'>
row2  <class 'str'>  <class 'str'>

Create by using this way

df = pd.DataFrame(data=[[1,2],[3,4]], index=['row1','row2'],columns=['col1','col2'])
df[(df['col1'].isin([1,2])) & (df['col2'].isin([2,4]))]
Out[143]: 
      col1  col2
row1     1     2

1 Comment

Not mine but one small fix - the columns are of object dtype, whereas the elements inside the object columns are of str type.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.