2

I am a newbie to python and pandas. I need to do some simple parsing of a pandas dataframe to get a new dataframe, involving multiple functions. Here's a toy example:

df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])})

>>> df
      A       B      C
0  T100     520  10/50
1  T100     620  20/50
2  M100     720  30/50
3  M100     820  50/50

This is what I have tried (and naturally it didn't work - it returned the error AttributeError: 'DataFrame' object has no attribute 'agg', but the idea of what I want to do is there):

 def get_pat_ID(row):
      sample = row['A']
      patID = re.match("[TM](\d+)", sample).group(1)
      return(patID)

 def get_funcB(row):
      sample, b, c = row['A'], row['B'], row['C']
      if sample == "T100":
           output = b + "_" + c
      else:
           output = "NA"
      return(output)   

  def cust(dataset, funcname):
      f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe
      return(f)

  funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe         
  funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function
  newdf = pd.DataFrame()
  newdf = df.agg(funcs)

I know that my method is not the most efficient anyway as the apply function reiterates over the same rows each time I calculate a function. Can anyone help me pls?

1 Answer 1

1
>>> ndf = df.apply(lambda x: pd.Series(data=[get_pat_ID(x), get_funcB(x)], index=['pat_ID','get_funcB']), axis=1)
>>> ndf
  pat_ID  get_funcB
0    100  520_10/50
1    100  620_20/50
2    100         NA
3    100         NA
>>> pd.concat([df,ndf], axis=1)
      A    B      C pat_ID  get_funcB
0  T100  520  10/50    100  520_10/50
1  T100  620  20/50    100  620_20/50
2  M100  720  30/50    100         NA
3  M100  820  50/50    100         NA

Or even with simple loop:

>>> ndf = df.copy()
>>> for k,v in funcdict.iteritems():
...     ndf[k] = ndf.apply(v, axis=1)
... 
>>> ndf
      A    B      C      funcB pat_ID
0  T100  520  10/50  520_10/50    100
1  T100  620  20/50  620_20/50    100
2  M100  720  30/50         NA    100
3  M100  820  50/50         NA    100
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry for the late response! Thanks for your answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.