My DataFrame is 94 columns by 728k rows. Each value is a string representing a colour. I'm aiming to convert each colour to a corresponding numeric value.
Here's a reproducible example. In this example I want to convert the strings as follows:
blue = 1
green = 2
red = 3
grey = 4
orange = 5
data = {'group1': ['red', 'grey', 'blue', 'orange'],
'group2': ['red', 'green', 'blue', 'blue'],
'group3': ['orange', 'blue', 'orange', 'green']}
data = pd.DataFrame(data)
data
group1 group2 group3
0 red red orange
1 grey green blue
2 blue blue orange
3 orange blue green
Output would be:
group1 group2 group3
0 3 3 5
1 4 2 1
2 1 1 5
3 5 1 2
How could I do this efficiently given the size of my actual data?