I am a newbie to python and pandas. I need to do some simple parsing of a pandas dataframe to get a new dataframe, involving multiple functions. Here's a toy example:
df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])})
>>> df
A B C
0 T100 520 10/50
1 T100 620 20/50
2 M100 720 30/50
3 M100 820 50/50
This is what I have tried (and naturally it didn't work - it returned the error AttributeError: 'DataFrame' object has no attribute 'agg', but the idea of what I want to do is there):
def get_pat_ID(row):
sample = row['A']
patID = re.match("[TM](\d+)", sample).group(1)
return(patID)
def get_funcB(row):
sample, b, c = row['A'], row['B'], row['C']
if sample == "T100":
output = b + "_" + c
else:
output = "NA"
return(output)
def cust(dataset, funcname):
f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe
return(f)
funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe
funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function
newdf = pd.DataFrame()
newdf = df.agg(funcs)
I know that my method is not the most efficient anyway as the apply function reiterates over the same rows each time I calculate a function. Can anyone help me pls?