Let's have the following example dataframe:
import pandas as pd
import numpy as np
some_data = pd.DataFrame({
'col_a': [1, 2, 1, 2, 3, 4, 3, 4],
'col_b': ['a', 'b', 'c', 'c', 'a', 'b', 'z', 'z']
})
We want to create a new column based on one (or more) of the existing columns' values.
In case you have only two options, I would suggest using numpy.where like this:
some_data['np_where_example'] = np.where(some_data.col_a < 3, 'less_than_3', 'greater_than_3')
print(some_data)
>>>
col_a col_b col_c map_example np_where_example \
0 1 a less_than_3 NaN less_than_3
1 2 b less_than_3 BBB less_than_3
2 1 c less_than_3 NaN less_than_3
3 2 c less_than_3 NaN less_than_3
4 3 a greater_than_3 NaN greater_than_3
5 4 b greater_than_3 BBB greater_than_3
6 3 z greater_than_3 ZZZ greater_than_3
7 4 z greater_than_3 ZZZ greater_than_3
# multiple conditions
some_data['np_where_multiple_conditions'] = np.where(((some_data.col_a >= 3) & (some_data.col_b == 'z')),
'is_true',
'is_false')
print(some_data)
>>>
col_a col_b np_where_multiple_conditions
0 1 a is_false
1 2 b is_false
2 1 c is_false
3 2 c is_false
4 3 a is_false
5 4 b is_false
6 3 z is_true
7 4 z is_true
In case you have many options, then pandas.map would be better:
some_data['map_example'] = some_data.col_b.map({
'b': 'BBB',
'z': 'ZZZ'
})
print(some_data)
>>>
col_a col_b map_example
0 1 a NaN
1 2 b BBB
2 1 c NaN
3 2 c NaN
4 3 a NaN
5 4 b BBB
6 3 z ZZZ
7 4 z ZZZ
As you see, in all cases the values for which a condition is not specified evaluate to NaN.
for i, j in zip(df['A'], df['B']): if i == 1: j == 2etc... You can loop through multiple columns in parallel with zip.iterrowsshould be avoided).