Python apply function to each row of DataFrame

Question

I have DataFrame with two columns: Type and Name. The values in each cell are lists of equal length, i.e we have pairs (Type, Name). I want to:

Group Name by it's Type
Create column Type with the values of Names

My current code is a for loop:

for idx, row in df.iterrows():
    for t in list(set(row["Type"])):
        df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]

but it works very slow. How can I speed up this code?

EDIT Here is the code example which ilustrates what I want to obtain but in a faster way:

import pandas as pd
df = pd.DataFrame({"Type": [["1", "1", "2", "3"], ["2","3"]], "Name": [["A", "B", "C", "D"], ["E", "F"]]})

unique = list(set(row["Type"]))
for t in unique:
    df[t] = None
    df[t] = df[t].astype('object')

for idx, row in df.iterrows():
    for t in unique:
        df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]

Why not use the values in the cell directly instead of lists of values? Then you don't need iterrows anymore — niclas
– niclas, Commented Jun 24, 2022 at 23:35
Could you show an example of the data? Pandas doesn't work very well with lists as elements, so this is working a bit uphill - maybe you could use a different way to express this code in pandas? — ramslök
– ramslök, Commented Jun 24, 2022 at 23:41
I don't understand what you want to accomplish, however pandas doesn't like iterrows or similars, just try to vectorize your code. — Norhther
– Norhther, Commented Jun 25, 2022 at 0:12

rh-calvin · Accepted Answer · 2022-06-25 00:36:42Z

1

You could write a function my_function(param) and then do something like this:

df['type'] = df['name'].apply(lambda x: my_function(x))

There are likely better alternatives to using lambda functions, but lambdas are what I remember. If you post a simplified mock of your original data and what the desired output should look like, it may help you find the best answer to your question. I'm not certain I understand what you're trying to do. A literal group by should be done using Dataframes' groupby method.

edited Jun 25, 2022 at 0:36

answered Jun 25, 2022 at 0:27

rh-calvin

235 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jason Over a year ago

I added details to my question

Chris Seeling · Accepted Answer · 2022-06-25 01:10:58Z

0

If I understand correctly your dataframe looks something like this:

df = pd.DataFrame({'Name':['a,b,c','d,e,f,g'], 'Type':['3,3,2','1,2,2,1']}) 


Name    Type
0   a,b,c   3,3,2
1   d,e,f,g 1,2,2,1

where the elements are lists of strings. Start with running:

df['Name:Type'] = (df['Name']+":"+df['Type']).map(process)

using:

def process(x):
    x_,y_ = x.split(':')
    x_ = x_.split(','); y_ = y_.split(',')
    s = zip(x_,y_)
    str_ = ','.join(':'.join(y) for y in s)
    return str_

Then you will get:

This reduces the problem to a single column. Finally produce the dataframe required by:

l = ','.join(df['Name:Type'].to_list()).split(',')
pd.DataFrame([i.split(':') for i in l], columns=['Name','Type'])

Giving:

edited Jun 25, 2022 at 1:10

answered Jun 25, 2022 at 1:02

Chris Seeling

6564 silver badges11 bronze badges

1 Comment

Jason Over a year ago

I added details to my question

SergFSM · Accepted Answer · 2022-06-26 07:58:02Z

0

is it the result you want? (if not then add to your question an example of desired output):

res = df.explode(['Name','Type']).groupby('Type')['Name'].agg(list)

print(res)
'''
Type
1    [A, B]
2    [C, E]
3    [D, F]
Name: Name, dtype: object

UPD

df1 = df.apply(lambda x: pd.Series(x['Name'],x['Type']).groupby(level=0).agg(list).T,1)
res = pd.concat([df,df1],axis=1)

print(res)
'''
           Type          Name       1    2    3
0  [1, 1, 2, 3]  [A, B, C, D]  [A, B]  [C]  [D]
1        [2, 3]        [E, F]     NaN  [E]  [F]

edited Jun 26, 2022 at 7:58

answered Jun 25, 2022 at 21:17

SergFSM

1,4991 gold badge6 silver badges9 bronze badges

2 Comments

Jason Over a year ago

Not exactly, I added the desired result to my question

Jason Over a year ago

It works but much slower than my solution :( DataFrame has about 1mln such rows

Collectives™ on Stack Overflow

Python apply function to each row of DataFrame

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related