How to convert pandas data frame string values to numeric values

Question

I have a data set. It has some string columns. I want to convert these string columns. I'm developing a Neural network using this data set. But since the dataset has some string values I can't train my Neural network. What is the best way to convert these string values to Neural Network readable format?

This is the data set that I have

type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,1,0
PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,0,1

I want to convert those type,nameOrig,nameDest fields to neural network readable format.

I have used below method. But I don't know wheater it's right or wrong.

import pandas as pd
from sklearn.preprocessing import LabelEncoder

enc = LabelEncoder()

test_set = pd.read_csv('cs.csv')
new_test_set['type'] = enc.fit(new_test_set['type'])

I have gone through below questions. But most of them are not worked for me

How to convert string based data frame to numeric

converting non-numeric to numeric value using Panda libraries

"Most of them are not worked" - why not? What happened? What did you expect? — DYZ
– DYZ, Commented Jan 5, 2019 at 19:41
According to the third link question that I have added in links, I'm using the LabelEncoder. But others are gave me some errors — Theesh
– Theesh, Commented Jan 5, 2019 at 19:56

Darius · Accepted Answer · 2019-01-06 13:59:40Z

2

In this case you can use the datatype category of pandas to map strings to indices (see categorical data). So it's not necessary to use LabelEncoder or OneHotEncoder of scikit-learn.

import pandas as pd

df = pd.read_csv('54055554.csv', header=0, dtype={
    'type': 'category',  # <--
    'amount': float,
    'nameOrig': str,
    'oldbalanceOrg': float,
    'newbalanceOrig': float,
    'nameDest': str,
    'oldbalanceDest': float,
    'newbalanceDest': float,
    'isFraud': bool,
    'isFlaggedFraud': bool
})

print(dict(enumerate(df['type'].cat.categories)))
# {0: 'PAYMENT', 1: 'TRANSFER'}

print(list(df['type'].cat.codes))
# [0, 0, 1]

The data from the CSV:

type, ...
PAYMENT, ...
PAYMENT, ...
TRANSFER, ...

answered Jan 6, 2019 at 13:59

Darius

12.4k2 gold badges33 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Anidhya Bhatnagar · Accepted Answer · 2019-01-05 20:08:39Z

Transformation

First you need to transform the three columns using LableEncoder class.

Encoding Categorical Data

Well here you have the type as categorical value. For this you can use the class OneHotEncoder available in sklearn.preprocessing.

Avoiding Dummy Variable Trap

Then you need to avoid the Dummy Variable Trap by removing any one of the column that are being used to represent type.

Code

Here I have put the sample code for your reference.

import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

dataset = pd.read_csv('cs.csv')
X = dataset.iloc[:].values

labelencoder = LabelEncoder()

X[:, 0] = labelencoder.fit_transform(X[:, 0])
X[:, 2] = labelencoder.fit_transform(X[:, 2])
X[:, 5] = labelencoder.fit_transform(X[:, 5])

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap
X = X[:, 1:]

Walid Da. · Accepted Answer · 2019-01-05 21:07:22Z

2

You need to encode the string values into numeric ones. What I usually do in this case is creating a table by a non numeric feature, the created table contains all the possible value of that feature. And then, the index of the value in the corresponding features table is used when training a model.

Example:

type_values = ['PAYMENT', 'TRANSFER']

edited Jan 5, 2019 at 21:07

answered Jan 5, 2019 at 19:41

Walid Da.

9461 gold badge7 silver badges17 bronze badges

1 Comment

DYZ Over a year ago

Surely using the standard LabelEncoder is preferred to any ad hoc solution.

Collectives™ on Stack Overflow

How to convert pandas data frame string values to numeric values

3 Answers 3

Comments

Transformation

Encoding Categorical Data

Avoiding Dummy Variable Trap

Code

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Transformation

Encoding Categorical Data

Avoiding Dummy Variable Trap

Code

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related