11

I have a data frame with two columns :

state  total_sales
AL      16714
AR      6498
AZ      107296
CA      33717

Now I want to map the strings in state column to int from 1 to N(where N is the no of rows,here 4 ) based on increasing order of values in total_sales . Result should be stored in another column (say label). That is, wanted a result like this :

state  total_sales label
AL      16714         3
AR      6498          4
AZ      107296        1
CA      33717         2

Please suggest a vectorised implementation .

2
  • 1
    Please show some of your efforts Commented Feb 19, 2017 at 17:48
  • @ZdaR : Implemented it with a for loop , but read that loops are not advised for dataframes . Commented Feb 19, 2017 at 17:49

2 Answers 2

11

You can use rank with cast to int:

df['label'] = df['total_sales'].rank(method='dense', ascending=False).astype(int)
print (df)
  state  total_sales  label
0    AL        16714      3
1    AR         6498      4
2    AZ       107296      1
3    CA        33717      2
Sign up to request clarification or add additional context in comments.

Comments

3

One option for converting a column of values to integers is pandas.Categorical.

This actually groups identical values, which in a case like this, where all values are unique each "group" has only one value. The resulting object has a codes attribute, which is a Numpy array of integers indicating which group each input value is in.

Applied to this problem, if you have

In [12]: data = pd.DataFrame({
             'state': ['AL', 'AR', 'AZ', 'CA'],
             'total_sales': [16714, 6498, 107296, 33717]
         })

you can add the label column as described using

In [13]: data['label'] = len(data) - pd.Categorical(data.total_sales, ordered=True).codes
In [14]: print(data)

  state  total_sales  label
0    AL        16714      3
1    AR         6498      4
2    AZ       107296      1
3    CA        33717      2

For this example it is not as fast as jezrael's answer but it has a wide range of applications and it was faster for a larger series I was encoding to integers. It should be noted that if there are two identical values in the total_sales column it will assign them the same label.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.