1

I have a dataframe below:

df = {'a': [1, 2, 3],
      'b': [77, 88, 99],
      'c1': [1, 1, 1],
      'c2': [2, 2, 2],
      'c3': [3, 3, 3]}
df = pd.DataFrame(df)

and a function:

def test_function(row):
    return row['b']

How can I apply this function on the 'c' columns (i.e. c1, c2 and c3), BUT only for specific rows whose 'a' value matches the 2nd character of the 'c' columns?

For example, for the first row, the value of 'a' is 1, so for the first row, I would like to apply this function on column 'c1'.

For the second row, the value of 'a' is 2, so for the second row, I would like to apply this function on column 'c2'. And so forth for the rest of the rows.

The desired end result should be:

df_final = {'a': [1, 2, 3],
            'b': [77, 88, 99],
            'c1': [77, 1, 1],
            'c2': [2, 88, 2],
            'c3': [3, 3, 99]}
df_final = pd.DataFrame(df_final)
0

1 Answer 1

2

Use Series.mask with compare c columns filtered by DataFrame.filter and if match repalce by values of b:

c_cols = df.filter(like='c').columns

def test_function(row):
    #for test integers from 0 to 9
    #m = c_cols.str[1].astype(int) == row['a']
    #for test integers from 0 to 100
    m = c_cols.str.extract('(\d+)', expand=False).astype(int) == row['a']
    row[c_cols] = row[c_cols].mask(m, row['b'])
    return row

df = df.apply(test_function, axis=1)
print (df)
   a   b  c1  c2  c3
0  1  77  77   2   3
1  2  88   1  88   3
2  3  99   1   2  99

Non loop faster alternative with broadcasting:

arr = c_cols.str.extract('(\d+)', expand=False).astype(int).to_numpy()[:, None]
m = df['a'].to_numpy() == arr
df[c_cols] = df[c_cols].mask(m, df['b'], axis=0)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Jezrael! I think it's nearly there but is it possible to make use of the "test_function" function? I tried using the .apply function but it's still giving me some error. I've updated the question too because of the assertion error. Thanks for the heads-up!
@Alvin - In c columns are only integers 0 to 9 ? Answer was edited, can you check?
the integers in my actual data range from 1 to 100. The function works! Thanks a lot for your help :) I tried to for a for loop "for i in df['a']" but it didn't work. Didn't know I had to edit the function itself.
@Alvin - Solution was changed for match 1 to 100.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.