56

I have a very large dataset were I want to replace strings with numbers. I would like to operate on the dataset without typing a mapping function for each key (column) in the dataset. (similar to the fillna method, but replace specific string with assosiated value). Is there anyway to do this?

Here is an example of my dataset

data
   resp          A          B          C
0     1       poor       poor       good
1     2       good       poor       good
2     3  very good  very good  very good
3     4       bad        poor       bad 
4     5   very bad   very bad   very bad
5     6       poor       good   very bad
6     7       good       good       good
7     8  very good  very good  very good
8     9       bad        bad    very bad
9    10   very bad   very bad   very bad

The desired result:

 data
   resp  A  B  C
0      1  3  3  4
1     2  4  3  4
2     3  5  5  5
3     4  2  3  2
4     5  1  1  1
5     6  3  4  1
6     7  4  4  4
7     8  5  5  5
8     9  2  2  1
9    10  1  1  1

very bad=1, bad=2, poor=3, good=4, very good=5

//Jonas

1
  • In more recent versions of pandas, there are more performant alternatives involving map and pd.Categorical. See this answer. Commented Jan 23, 2019 at 8:14

3 Answers 3

78

Use replace

In [126]: df.replace(['very bad', 'bad', 'poor', 'good', 'very good'], 
                     [1, 2, 3, 4, 5]) 
Out[126]: 
      resp  A  B  C
   0     1  3  3  4
   1     2  4  3  4
   2     3  5  5  5
   3     4  2  3  2
   4     5  1  1  1
   5     6  3  4  1
   6     7  4  4  4
   7     8  5  5  5
   8     9  2  2  1
   9    10  1  1  1
Sign up to request clarification or add additional context in comments.

2 Comments

This isn't working in 0.20.1. See pandas.pydata.org/pandas-docs/version/0.20/generated/… for new syntax.
Above with a minor variation should work df.replace(['very bad', 'bad', 'poor', 'good', 'very good'], [1, 2, 3, 4, 5], inplace=True)
13

Considering data is your pandas DataFrame you can also use:

data.replace({'very bad': 1, 'bad': 2, 'poor': 3, 'good': 4, 'very good': 5}, inplace=True)

Comments

1

data = data.replace(['very bad', 'bad', 'poor', 'good', 'very good'], [1, 2, 3, 4, 5])

You must state where the result should be saved. If you say only data.replace(...) it is only shown as a change in preview, not in the envirable itself.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.