2

My dataframe has many columns. one of these columns is array

df
Out[191]: 
       10012005  10029008  10197000  ...  filename_int  filename      result
0           0.0       0.0       0.0  ...             1       1.0  [280, NON]
1           0.0       0.0       0.0  ...            10      10.0  [286, NON]
2           0.0       0.0       0.0  ...           100     100.0  [NON, 285]
3           0.0       0.0       0.0  ...         10000   10000.0  [NON, 286]
4           0.0       0.0       0.0  ...         10001   10001.0       [NON]
        ...       ...       ...  ...           ...       ...         ...
52708       0.0       0.0       0.0  ...          9995    9995.0       [NON]
52709       0.0       0.0       0.0  ...          9996    9996.0       [NON]
52710       0.0       0.0       0.0  ...          9997    9997.0  [285, NON]
52711       0.0       0.0       0.0  ...          9998    9998.0       [NON]
52712       0.0       0.0       0.0  ...          9999    9999.0       [NON]

[52713 rows x 4289 columns]

the column result is an array of these values

[NON]
[123,NON]
[357,938,837]
[455,NON,288]
[388,929,NON,020]

I want my filter dataframe to only display records that has values other than NON

therefore values such as

[NON,NON]
[NON]
[]

these will be excluded

only in the filer values like

[123,NON]
[357,938,837]
[455,NON,288]
[388,929,NON,020]

I tried this code

df[len(df["result"])!="NON"]

but I get this error !!

  File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: True

how to filter my dataframe?

3 Answers 3

2

Try map with lambda here:

df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [[280, 'NON'], ['NON'], [], [285]] })
df

   A           B
0  1  [280, NON]
1  2       [NON]
2  3          []
3  4       [285]

df[df['B'].map(lambda x: any(y != 'NON' for y in x))]

   A           B
0  1  [280, NON]
3  4       [285]

The generator expression inside map returns True if there are at least 1 items in the list which are "NON".

Sign up to request clarification or add additional context in comments.

Comments

1

You can use apply to identify rows that meet your criteria. Here, the filter works because apply returns a boolean.

import pandas as pd
import numpy as np

vals = [280, 285, 286, 'NON', 'NON', 'NON']
listcol = [np.random.choice(vals, 3) for _ in range(100)] 
df = pd.DataFrame({'vals': listcol})

def is_non(l):
    return len([i for i in l if i != 'NON']) > 0

df.loc[df.vals.apply(is_non), :]

Comments

1

I will do

s=pd.DataFrame(df.B.tolist())
df=df[(s.ne('NON')&s.notnull()).any(1).to_numpy()].copy()
   A           B
0  1  [280, NON]
3  4       [285]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.