python pandas replacing strings in dataframe with numbers

Question

Is there any way to use the mapping function or something better to replace values in an entire dataframe?

I only know how to perform the mapping on series.

I would like to replace the strings in the 'tesst' and 'set' column with a number for example set = 1, test =2

Here is a example of my dataset: (Original dataset is very large)

ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age tesst   set
0          a  volvo      p      swe      1        0        1   23   set   set
1          b  volvo   None      swe      0        0        1   45   set   set
2          c    bmw      p       us      0        0        1   56  test  test
3          d    bmw      p       us      0        1        1   43  test  test
4          e    bmw      d  germany      1        0        1   34   set   set
5          f   audi      d  germany      1        0        1   59   set   set
6          g  volvo      d      swe      1        0        0   65  test   set
7          h   audi      d      swe      1        0        0   78  test   set
8          i  volvo      d       us      1        1        1   32   set   set

Final result should be

 ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age  tesst  set
0          a  volvo      p      swe      1        0        1   23      1    1
1          b  volvo   None      swe      0        0        1   45      1    1
2          c    bmw      p       us      0        0        1   56      2    2
3          d    bmw      p       us      0        1        1   43      2    2
4          e    bmw      d  germany      1        0        1   34      1    1
5          f   audi      d  germany      1        0        1   59      1    1
6          g  volvo      d      swe      1        0        0   65      2    1
7          h   audi      d      swe      1        0        0   78      2    1
8          i  volvo      d       us      1        1        1   32      1    1

Zulan · Accepted Answer · 2016-06-03 10:41:25Z

91

What about DataFrame.replace?

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]: 
   Unnamed: 0 respondent  brand engine  country  aware  aware_2  aware_3  age  \
0           0          a  volvo      p      swe      1        0        1   23   
1           1          b  volvo   None      swe      0        0        1   45   
2           2          c    bmw      p       us      0        0        1   56   
3           3          d    bmw      p       us      0        1        1   43   
4           4          e    bmw      d  germany      1        0        1   34   
5           5          f   audi      d  germany      1        0        1   59   
6           6          g  volvo      d      swe      1        0        0   65   
7           7          h   audi      d      swe      1        0        0   78   
8           8          i  volvo      d       us      1        1        1   32   

  tesst set  
0     2   1  
1     1   2  
2     2   1  
3     1   2  
4     2   1  
5     1   2  
6     2   1  
7     1   2  
8     2   1

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects() onto the end to properly convert tesst and set to int64 columns, in case that matters in subsequent operations.

edited Jun 3, 2016 at 10:41

Zulan

22.8k7 gold badges57 silver badges117 bronze badges

answered Jun 14, 2013 at 18:41

Dan Allan

35.5k6 gold badges72 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jeff Over a year ago

note that you might want to do a df.convert_objects() after the replacement to coerce to proper dtypes

Jeff Over a year ago

@Dan Allan this will be default in 0.11.1, FYI (to convert_objects)

Ishnark Over a year ago

This is super old but you can also do this now: df.replace(to_replace=['set', 'test'], value=[1, 2])

H S Rathore Over a year ago

I think we shouldn't ask to hardcode name of the values, It should be dynamically picked up at run time and assigned number.

binaryEcon Over a year ago

For pandas v2.2.0 this raises FutureWarning: Downcasting behavior in 'replace' is deprecated and will be removed in a future version. To retain.... Suggested infer_objects(copy=False) does not work.

Brandon · Accepted Answer · 2016-10-12 03:08:55Z

32

I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code

ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))

That will give you a dictionary map of the ip addresses without having to write it out.

answered Oct 12, 2016 at 3:08

Brandon

4214 silver badges2 bronze badges

Comments

bdiamante · Accepted Answer · 2013-06-14 18:38:35Z

19

You can use the applymap DataFrame function to do this:

In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
                         "C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
   A  B  C  D
0  1  a  b  a
1  2  b  a  c
2  3  c  c  7
3  4  d  c  9
4  5  e  d  2

In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
   A  B  C  D
0  1  1  2  1
1  2  2  1  3
2  3  3  3  7
3  4  4  3  9
4  5  5  4  2

answered Jun 14, 2013 at 18:38

bdiamante

17.8k6 gold badges43 silver badges48 bronze badges

5 Comments

SRS Over a year ago

I working on the problem like this and I just followed the exact steps mentioned in your answer. I am not getting the output. Code: wc = pd.read_csv('PATH', usecols = ['Workclass'])

SRS Over a year ago

df = pd.DataFrame(wc) end of line wcdict = {"?":0,"Federal-gov":1,"Local-gov":2,"Never-worked":3,"Private":4,"Self-emp-inc":5, "Self-emp-n-inc":6,"State-gov":7,"Without-pay":8} end of line df.applymap(lambda s: wcdict.get(s) if s in wcdict else s) end of line print(df)

bdiamante Over a year ago

df.applymap(lambda s: mymap.get(s) if s in mymap else s) does not make inline changes to df, so your print df statement will not reflect the results of the applymap. You need to do an assigment like df2 = df.applymap(lambda s: mymap.get(s) if s in mymap else s). print df2 will now reflect the changes.

SRS Over a year ago

That worked!! Thanks :) I have one more question, I need to work with pyspark rather than working with normal python. Does the implementation of this logic differs in pyspark? When I created a data frame, I gave the file path [as shown in above comments] but, I would like to give an RDD as the input to data frame. I couldn't do that. Do you have any idea about this?

bdiamante Over a year ago

Glad it worked. I'm really not sure... perhaps this might be a start?

Kapilfreeman · Accepted Answer · 2020-03-31 05:50:56Z

12

The simplest way to replace any value in the dataframe:

df=df.replace(to_replace="set",value="1")
df=df.replace(to_replace="test",value="2")

Hope this will help.

answered Mar 31, 2020 at 5:50

Kapilfreeman

1,23511 silver badges11 bronze badges

Comments

Samer Ayoub · Accepted Answer · 2019-09-14 13:25:00Z

8

To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()

  df  = DataFrame.from_csv("myFile.csv")
  df_transform = pd.get_dummies( df )
  print( df_transform )

Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)

df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )

edited Sep 14, 2019 at 13:25

answered Jul 12, 2017 at 20:31

Samer Ayoub

1,00110 silver badges10 bronze badges

Comments

ZachB · Accepted Answer · 2022-09-27 18:09:31Z

4

pandas.factorize() does exactly this.

>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)

With a DataFrame:

df["tesst"], tesst_key = pandas.factorize(df["tesst"])

edited Sep 27, 2022 at 18:09

answered Sep 27, 2022 at 15:39

ZachB

15.8k5 gold badges67 silver badges94 bronze badges

Comments

tsando · Accepted Answer · 2018-05-18 14:13:05Z

2

You can also do this with pandas rename_categories. You would first need to define the column as dtype="category" e.g.

In [66]: s = pd.Series(["a","b","c","a"], dtype="category")

In [67]: s
Out[67]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]

and then rename them:

In [70]: s.cat.rename_categories([1,2,3])
Out[70]: 
0    1
1    2
2    3
3    1
dtype: category
Categories (3, int64): [1, 2, 3]

You can also pass a dict-like object to map the renaming, e.g.:

In [72]: s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})

answered May 18, 2018 at 14:13

tsando

4,7373 gold badges38 silver badges39 bronze badges

2 Comments

HerrIvan Over a year ago

in general, what is this category type for?

tsando Over a year ago

@HerrIvan there's plenty of documentation here pandas.pydata.org/pandas-docs/stable/categorical.html

Akash Kandpal · Accepted Answer · 2018-06-02 09:07:18Z

2

When no of features are not much :

mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
df.applymap(lambda s: mymap.get(s) if s in mymap else s)

When it's not possible manually :

temp_df2 = pd.DataFrame({'data': data.data.unique(), 'data_new':range(len(data.data.unique()))})# create a temporary dataframe 
data = data.merge(temp_df2, on='data', how='left')# Now merge it by assigning different values to different strings.

answered Jun 2, 2018 at 9:07

Akash Kandpal

3,43431 silver badges26 bronze badges

Comments

Manoj Kumar Dhakad · Accepted Answer · 2020-07-05 15:07:17Z

1

You can build dictionary from column values itself and fill like below

x=df['Item_Type'].value_counts()
item_type_mapping={}
item_list=x.index
for i in range(0,len(item_list)):
    item_type_mapping[item_list[i]]=i

df['Item_Type']=df['Item_Type'].map(lambda x:item_type_mapping[x])

answered Jul 5, 2020 at 15:07

Manoj Kumar Dhakad

1,9021 gold badge14 silver badges27 bronze badges

Comments

Chapo · Accepted Answer · 2019-05-30 01:47:26Z

0

df.replace(to_replace=['set', 'test'], value=[1, 2]) from @Ishnark comment on accepted answer.

answered May 30, 2019 at 1:47

Chapo

2,6634 gold badges36 silver badges70 bronze badges

Collectives™ on Stack Overflow

python pandas replacing strings in dataframe with numbers

10 Answers 10

5 Comments

Comments

5 Comments

Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

5 Comments

Comments

5 Comments

Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related