Imagine a funtion like the following:
def func(df, cols, col_ref):
for c in cols:
df[c] = df.apply(lambda row: row[c] * ref[(ref.SOURCE == row[col_ref])].VALUE.item() ,axis=1)
return df
When calling this function, parameters are
- a dataframe with multiple columns (df)
- one or more columns (cols)
- a reference column where the value of the current row indicates which row of the other dataframe (ref) is used
I can call the function e.g. like this:
df_new = func(df, ['col1','col2','col3'], 'ref_value')
or like this:
df_new2 = func(df, 'col4', 'ref_value')
Is there an alternative to the for loop? My dataframe is huge and it takes up to an hour to perform this with a for loop.
Important is, that the function is still able to handle one column as well as multiple columns as second parameter.
EDIT
A simple example:
df
+-----+------+------+------+------+-----------+
| No | col1 | col2 | col3 | col4 | ref_value |
+-----+------+------+------+------+-----------+
| 523 | 34 | 593 | 100 | 10 | A1 |
| 523 | 100 | 100 | 100 | 43 | A1 |
| 523 | 1867 | 15 | 632 | 64 | B2 |
| 732 | 100 | 943 | 375 | 325 | B1 |
| 732 | 1000 | 656 | 235 | 63 | B1 |
+-----+------+------+------+------+-----------+
ref
+--------+-------+
| SOURCE | VALUE |
+--------+-------+
| A1 | 10 |
| B1 | 1000 |
| B2 | 100 |
+--------+-------+
output:
df_new
+-----+---------+--------+--------+------+-----------+
| No | col1 | col2 | col3 | col4 | ref_value |
+-----+---------+--------+--------+------+-----------+
| 523 | 340 | 5930 | 1000 | 10 | A1 |
| 523 | 1000 | 1000 | 1000 | 43 | A1 |
| 523 | 186700 | 1500 | 63200 | 64 | B2 |
| 732 | 100000 | 943000 | 375000 | 325 | B1 |
| 732 | 1000000 | 656000 | 235000 | 63 | B1 |
+-----+---------+--------+--------+------+-----------+
joinshould be possible here.