Pandas applying multiple custom functions

Question

I am a newbie to python and pandas. I need to do some simple parsing of a pandas dataframe to get a new dataframe, involving multiple functions. Here's a toy example:

df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])})

>>> df
      A       B      C
0  T100     520  10/50
1  T100     620  20/50
2  M100     720  30/50
3  M100     820  50/50

This is what I have tried (and naturally it didn't work - it returned the error AttributeError: 'DataFrame' object has no attribute 'agg', but the idea of what I want to do is there):

 def get_pat_ID(row):
      sample = row['A']
      patID = re.match("[TM](\d+)", sample).group(1)
      return(patID)

 def get_funcB(row):
      sample, b, c = row['A'], row['B'], row['C']
      if sample == "T100":
           output = b + "_" + c
      else:
           output = "NA"
      return(output)   

  def cust(dataset, funcname):
      f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe
      return(f)

  funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe         
  funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function
  newdf = pd.DataFrame()
  newdf = df.agg(funcs)

I know that my method is not the most efficient anyway as the apply function reiterates over the same rows each time I calculate a function. Can anyone help me pls?

roman · Accepted Answer · 2017-03-06 08:17:42Z

1

>>> ndf = df.apply(lambda x: pd.Series(data=[get_pat_ID(x), get_funcB(x)], index=['pat_ID','get_funcB']), axis=1)
>>> ndf
  pat_ID  get_funcB
0    100  520_10/50
1    100  620_20/50
2    100         NA
3    100         NA
>>> pd.concat([df,ndf], axis=1)
      A    B      C pat_ID  get_funcB
0  T100  520  10/50    100  520_10/50
1  T100  620  20/50    100  620_20/50
2  M100  720  30/50    100         NA
3  M100  820  50/50    100         NA

Or even with simple loop:

>>> ndf = df.copy()
>>> for k,v in funcdict.iteritems():
...     ndf[k] = ndf.apply(v, axis=1)
... 
>>> ndf
      A    B      C      funcB pat_ID
0  T100  520  10/50  520_10/50    100
1  T100  620  20/50  620_20/50    100
2  M100  720  30/50         NA    100
3  M100  820  50/50         NA    100

answered Mar 6, 2017 at 8:17

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

phusion Over a year ago

Sorry for the late response! Thanks for your answer!

Collectives™ on Stack Overflow

Pandas applying multiple custom functions

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related