4

How can I encode the column values of string types in the data table by integer values. For example I have two feature variables: color (possible string values R, G and B) and skills ( with possible string values C++ , Java, SQL and Python). Given Data-table has two columns-

Color' -> R G B B G R B G G R G  ;
Skills' -> Java , C++, SQL, Java, Python, Python, SQL, C++, Java, SQL, Java.

I want to know which sklearn function/method will transform above two columns as with R=0, G=1 and B=2 and with C++ =0, Java=1, SQL=2 and Python=3 :

Color: 0, 1, 2, 2, 1, 0, 2, 1, 1, 0, 1
Skills:  1, 0, 2, 1, 3, 3, 2, 0, 1, 2, 1

Kindly, let me know how to do this ??

3
  • What type of object are you using to hold the data? Please show us the given as code. Commented May 11, 2016 at 8:53
  • I can use np.array or dataframe to hold the data. However, I am free to use any type of object as long as I can store the feature variables (columns) data for various samples (rows). Commented May 11, 2016 at 9:48
  • It could more specifically be a list.... Commented May 11, 2016 at 10:01

1 Answer 1

8

Use Sckit-learn LabelEncoder() method

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({
'colors':  ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
'skills':  ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
})

def encode_df(dataframe):
    le = LabelEncoder()
    for column in dataframe.columns:
        dataframe[column] = le.fit_transform(dataframe[column])
    return dataframe

#encode the dataframe
encode_df(df)
Sign up to request clarification or add additional context in comments.

1 Comment

It worked out nicely. One observation, it won't work for NaN....but if its empty element, it will work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.