1

My DataFrame is 94 columns by 728k rows. Each value is a string representing a colour. I'm aiming to convert each colour to a corresponding numeric value.

Here's a reproducible example. In this example I want to convert the strings as follows:

blue = 1  
green = 2  
red = 3  
grey = 4  
orange = 5

data = {'group1': ['red', 'grey', 'blue', 'orange'],
   'group2': ['red', 'green', 'blue', 'blue'],
    'group3': ['orange', 'blue', 'orange', 'green']}

data = pd.DataFrame(data)
data

    group1  group2  group3
0   red     red     orange  
1   grey    green   blue
2   blue    blue    orange
3   orange  blue    green

Output would be:

    group1  group2  group3
0        3       3       5  
1        4       2       1
2        1       1       5
3        5       1       2

How could I do this efficiently given the size of my actual data?

1
  • 1
    may not be exactly what you are looking for, but take a look at sklearn.preprocessing.LabelEncoder as well. scikit-learn.org/stable/modules/generated/… Commented Mar 12, 2016 at 15:40

1 Answer 1

2

You could first use a dictionary to map the strings to integers:

d = {'blue': 1, 'green': 2, 'red': 3, 'grey': 4, 'orange': 5}

Then use replace and pass in that dictionary:

>>> data.replace(d)
   group1  group2  group3
0       3       3       5
1       4       2       1
2       1       1       5
3       5       1       2

A dictionary has the advantage of allowing you to pick which strings are mapped to which integers. If you don't mind the values being generated for you automatically, you could take advantage of pandas' categorical data type.

Ideally we'd write data.astype('category') and proceed from there, but as of 0.17.1, two-dimensional categorical conversions are not implemented.

A work-around is to stack, cast, and unstack:

>>> c_data = data.stack().astype('category')
>>> c_data.cat.codes.unstack()
   group1  group2  group3
0       4       4       3
1       2       1       0
2       0       0       3
3       3       0       1
Sign up to request clarification or add additional context in comments.

1 Comment

you can explicitly pass in categories when astyping to categorical as well to get whatever numerical codes u want

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.