I have a dataframe which has 472 columns. Of those 99 columns are dxpoa1, dxpoa2,...,dxpoa99. I want to filter out row(s) of dataframe in which dxpoa columns' values are either 7 or N or BLANK only. dxpoa's can have many values like Y, W,E,1, 7, N etc. Or they remain BLANK. Only those rows in which dxpoa's have either only 7 or N should be filtered out from data frame. Dataset is huge having many hundred thousands rows. Therefore an efficient method will be appreciated.
a b c dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 0 A X W N X
1 Z W 2 7 7
2 7 W N W W 1 Z
3 1 7 E N N N N
4 Y 0 W N X 1
5 N X 1 E 1 Z 7
6 1 X 7 0 A W A
7 X X Z X N A 1
8 7 1 A N X Z N
9 N A Z N N N
10 A N Z 7 0 A E
11 E N A Z N N 1
12 E A 1 Z E E W
13 N W Z E X A 0
14 Y 1 A W A E X
I want row number 1, 3, 9 removed from dataframe.
I have tried many ways like:
df_col = [list of dxpoa column names]
df1 = df[df_col].isin(["Y", "W", "1", "E"]).values
It does not filter out.
'7'or'N'or do you want to remove only those rows for which all of those columns contain'7'or'N'?''-- a value just like'7'or'N'. Then you can express the problem as one of removing rows for which all the values (in the dxpoa columns) are in['7', 'N', ''].