Python - Remove rows and columns in Matrix where all values are 0

Question

I have a Matrix that I created from a dataframe and I want to remove all columns where every value in it is a 0.

I have seen examples using dropna df2.loc[:, (df2 != 0).any(axis=0)] but it doesnt do anything with my dataframe.

This is how I created my Matrix:

a = ['Psychology','Education','Social policy','Sociology','Pol. sci. & internat. studies','Development studies','Social anthropology','Area Studies','Science and Technology Studies','Law & legal studies','Economics','Management & business studies','Human Geography','Environmental planning','Demography','Social work','Tools, technologies & methods','Linguistics','History']
final_df = new_df[new_df['Subject'].isin(a)]

ctrs = {location: Counter(gp.GrantRefNumber) for location, gp in final_df.groupby('Subject')}

ctrs = list(ctrs.items())
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1))
    for i, (loc1, ctr1) in enumerate(ctrs, start=1)
    for (loc2, ctr2) in ctrs[i:] if loc1 != loc2]
overlaps += [(l2, l1, c) for l1, l2, c in overlaps]

df22 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count'])
df22 = df22.set_index(['Loc1', 'Loc2'])
df22 = df22.unstack().fillna(0).astype(int)

#the end part of the next line filters the top 'x' amount.
b = np.sort(np.unique(df22.values.ravel()))[-20:]
df2 = df22.where(df22.isin(b),0.0)

Interestingly (or not), when I type df2.columns, I get:

MultiIndex(levels=[[u'Count'], [u'Area Studies', u'Demography', u'Development studies', u'Economics', u'Education', u'Environmental planning', u'History', u'Human Geography', u'Law & legal studies', u'Linguistics', u'Management & business studies', u'Pol. sci. & internat. studies', u'Psychology', u'Science and Technology Studies', u'Social anthropology', u'Social policy', u'Social work', u'Sociology', u'Tools, technologies & methods']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]],
           names=[None, u'Loc2'])

Which might be why I am struggling.

jezrael · Accepted Answer · 2017-09-12 10:52:36Z

2

You need all for True of columns which contains 0 with ~ for invert condition:

df = pd.DataFrame({'B':[0,0,0,0,0,0],
                   'C':[0,8,9,4,2,3],
                   'D':[0,3,5,7,1,0],
                   'E':[0,3,6,9,2,4]})

print (df)
   B  C  D  E
0  0  0  0  0
1  0  8  3  3
2  0  9  5  6
3  0  4  7  9
4  0  2  1  2
5  0  3  0  4

df = df.loc[~df.eq(0).all(axis=1), ~df.eq(0).all()]
print (df)
   C  D  E
1  8  3  3
2  9  5  6
3  4  7  9
4  2  1  2
5  3  0  4

edited Sep 12, 2017 at 10:52

answered Sep 12, 2017 at 10:39

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nicholas Over a year ago

Hey Jezrael, just tried both your examples and it removed everything from the dataframe and just left 'Loc1 and the list of disciplines underneath (i.e. deleted all the numbers and the column headers? I do have some columns with numbers in, so it shouldnt have deleted all of them.

Jon Clements Over a year ago

I was reading the requirement as remove columns where all the values were 0 - this removes those where any value is 0...

Nicholas Over a year ago

Ye, I meant ALL the values were 0 in a column :)

Nicholas Over a year ago

I am very thankful for the help. Is there a version for doing the rows too? As I have removed the columns but the rows which have only 0's have remained? :)

Collectives™ on Stack Overflow

Python - Remove rows and columns in Matrix where all values are 0

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related