Apply function only on specific rows AND columns using Python Pandas

Question

I have a dataframe below:

df = {'a': [1, 2, 3],
      'b': [77, 88, 99],
      'c1': [1, 1, 1],
      'c2': [2, 2, 2],
      'c3': [3, 3, 3]}
df = pd.DataFrame(df)

and a function:

def test_function(row):
    return row['b']

How can I apply this function on the 'c' columns (i.e. c1, c2 and c3), BUT only for specific rows whose 'a' value matches the 2nd character of the 'c' columns?

For example, for the first row, the value of 'a' is 1, so for the first row, I would like to apply this function on column 'c1'.

For the second row, the value of 'a' is 2, so for the second row, I would like to apply this function on column 'c2'. And so forth for the rest of the rows.

The desired end result should be:

df_final = {'a': [1, 2, 3],
            'b': [77, 88, 99],
            'c1': [77, 1, 1],
            'c2': [2, 88, 2],
            'c3': [3, 3, 99]}
df_final = pd.DataFrame(df_final)

jezrael · Accepted Answer · 2021-09-24 06:13:49Z

2

Use Series.mask with compare c columns filtered by DataFrame.filter and if match repalce by values of b:

c_cols = df.filter(like='c').columns

def test_function(row):
    #for test integers from 0 to 9
    #m = c_cols.str[1].astype(int) == row['a']
    #for test integers from 0 to 100
    m = c_cols.str.extract('(\d+)', expand=False).astype(int) == row['a']
    row[c_cols] = row[c_cols].mask(m, row['b'])
    return row

df = df.apply(test_function, axis=1)
print (df)
   a   b  c1  c2  c3
0  1  77  77   2   3
1  2  88   1  88   3
2  3  99   1   2  99

Non loop faster alternative with broadcasting:

arr = c_cols.str.extract('(\d+)', expand=False).astype(int).to_numpy()[:, None]
m = df['a'].to_numpy() == arr
df[c_cols] = df[c_cols].mask(m, df['b'], axis=0)

edited Sep 24, 2021 at 6:13

answered Sep 24, 2021 at 5:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alvin Over a year ago

Thanks Jezrael! I think it's nearly there but is it possible to make use of the "test_function" function? I tried using the .apply function but it's still giving me some error. I've updated the question too because of the assertion error. Thanks for the heads-up!

jezrael Over a year ago

@Alvin - In c columns are only integers 0 to 9 ? Answer was edited, can you check?

Alvin Over a year ago

the integers in my actual data range from 1 to 100. The function works! Thanks a lot for your help :) I tried to for a for loop "for i in df['a']" but it didn't work. Didn't know I had to edit the function itself.

jezrael Over a year ago

@Alvin - Solution was changed for match 1 to 100.

Collectives™ on Stack Overflow

Apply function only on specific rows AND columns using Python Pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related