1

I have a csv file like so:

Id               col1    col2    col3     
a                 7.04   0.3     1.2    
b                 0.3    1.7     .1   
c                 0.34   0.05    1.3   
d                 0.4    1.60    3.1

I want to convert it to a data frame thresholding on 0.5 . If the values is greater than or equal to 0.5 then the column is counted , otherwise it is not counted.

Id               classes    
a                 col1,col3    
b                 col2        
c                 col3   
d                 col2,col3     

The closest solution I found is this one. However it deals with single rows, not multiple rows. For multiple rows, the best I have is iterate through all the rows. I need a succinct expression without for loop.

2 Answers 2

2

Use set_index first and then numpy.where for extract columns by condition. Last remove empty strings by list comprehension:

df = df.set_index('Id')
s = np.where(df > .5, ['{}, '.format(x) for x in df.columns], '')
df['new'] = pd.Series([''.join(x).strip(', ') for x in s], index=df.index)
print (df)
    col1  col2  col3         new
Id                              
a   7.04  0.30   1.2  col1, col3
b   0.30  1.70   0.1        col2
c   0.34  0.05   1.3        col3
d   0.40  1.60   3.1  col2, col3

Similar for new DataFrame:

df1 = pd.DataFrame({'classes': [''.join(x).strip(', ') for x in s], 
                    'Id': df.index})
print (df1)
  Id     classes
0  a  col1, col3
1  b        col2
2  c        col3
3  d  col2, col3

And if necessary remove empty with ,:

df1 = pd.DataFrame({'classes': [''.join(x).strip(', ').replace(', ',',') for x in s], 
                    'Id': df.index})
print (df1)

  Id    classes
0  a  col1,col3
1  b       col2
2  c       col3
3  d  col2,col3

Detail:

print (s)
[['col1, ' '' 'col3, ']
 ['' 'col2, ' '']
 ['' '' 'col3, ']
 ['' 'col2, ' 'col3, ']]

Alternative with apply (slowier):

df1 = (df.set_index('Id')
         .apply(lambda x: ','.join(x.index[x > .5]), 1)
         .reset_index(name='classes'))
print (df1)
  Id    classes
0  a  col1,col3
1  b       col2
2  c       col3
3  d  col2,col3
Sign up to request clarification or add additional context in comments.

Comments

2

Comprehension after clever multiplication... This assumes that Id is the index.

df.assign(classes=[
    ','.join(s for s in row if s)
    for row in df.ge(.5).mul(df.columns).values
])

    col1  col2  col3    classes
Id                             
a   7.04  0.30   1.2  col1,col3
b   0.30  1.70   0.1       col2
c   0.34  0.05   1.3       col3
d   0.40  1.60   3.1  col2,col3

Setup Fun Trick
Custom subclass of str that redefines string addition to include a ','

class s(str):
    def __add__(self, other):
        if self and other:
            return s(super().__add__(',' + other))
        else:
            return s(super().__add__(other))

Fun Trick

df.ge(.5).mul(df.columns).applymap(s).sum(1)

Id
a    col1,col3
b         col2
c         col3
d    col2,col3
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.