1

I have a Matrix that I created from a dataframe and I want to remove all columns where every value in it is a 0.

I have seen examples using dropna df2.loc[:, (df2 != 0).any(axis=0)] but it doesnt do anything with my dataframe.

This is how I created my Matrix:

a = ['Psychology','Education','Social policy','Sociology','Pol. sci. & internat. studies','Development studies','Social anthropology','Area Studies','Science and Technology Studies','Law & legal studies','Economics','Management & business studies','Human Geography','Environmental planning','Demography','Social work','Tools, technologies & methods','Linguistics','History']
final_df = new_df[new_df['Subject'].isin(a)]

ctrs = {location: Counter(gp.GrantRefNumber) for location, gp in final_df.groupby('Subject')}

ctrs = list(ctrs.items())
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1))
    for i, (loc1, ctr1) in enumerate(ctrs, start=1)
    for (loc2, ctr2) in ctrs[i:] if loc1 != loc2]
overlaps += [(l2, l1, c) for l1, l2, c in overlaps]

df22 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count'])
df22 = df22.set_index(['Loc1', 'Loc2'])
df22 = df22.unstack().fillna(0).astype(int)

#the end part of the next line filters the top 'x' amount.
b = np.sort(np.unique(df22.values.ravel()))[-20:]
df2 = df22.where(df22.isin(b),0.0)

Interestingly (or not), when I type df2.columns, I get:

MultiIndex(levels=[[u'Count'], [u'Area Studies', u'Demography', u'Development studies', u'Economics', u'Education', u'Environmental planning', u'History', u'Human Geography', u'Law & legal studies', u'Linguistics', u'Management & business studies', u'Pol. sci. & internat. studies', u'Psychology', u'Science and Technology Studies', u'Social anthropology', u'Social policy', u'Social work', u'Sociology', u'Tools, technologies & methods']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]],
           names=[None, u'Loc2'])

Which might be why I am struggling.

1 Answer 1

2

You need all for True of columns which contains 0 with ~ for invert condition:

df = pd.DataFrame({'B':[0,0,0,0,0,0],
                   'C':[0,8,9,4,2,3],
                   'D':[0,3,5,7,1,0],
                   'E':[0,3,6,9,2,4]})

print (df)
   B  C  D  E
0  0  0  0  0
1  0  8  3  3
2  0  9  5  6
3  0  4  7  9
4  0  2  1  2
5  0  3  0  4

df = df.loc[~df.eq(0).all(axis=1), ~df.eq(0).all()]
print (df)
   C  D  E
1  8  3  3
2  9  5  6
3  4  7  9
4  2  1  2
5  3  0  4
Sign up to request clarification or add additional context in comments.

4 Comments

Hey Jezrael, just tried both your examples and it removed everything from the dataframe and just left 'Loc1 and the list of disciplines underneath (i.e. deleted all the numbers and the column headers? I do have some columns with numbers in, so it shouldnt have deleted all of them.
I was reading the requirement as remove columns where all the values were 0 - this removes those where any value is 0...
Ye, I meant ALL the values were 0 in a column :)
I am very thankful for the help. Is there a version for doing the rows too? As I have removed the columns but the rows which have only 0's have remained? :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.