0

I have the following df:

print(df)
>>>
Marital Status       Income     Education 
  Married             66613       PhD  
  Married             12441       Bachelors 
  Single              52842       Masters Degree
  Relationship        78238       PhD
  Divorced            21242       High School
  Single              47183       Masters Degree

I'd like to convert every "String" to a corresponding number (int). E.g.

"Married" should be 1

"Single" 2

"Relationship" 3

and so on.

I still haven't tried any code yet since I haven't found any reasonable solution after googling for around 1 hour now, but I am sure that the solution is most likely incredibly simple.

Edit: grammar

6 Answers 6

6

This may help you to get what you need.

df['Marital Status'] = df['Marital Status'].astype('category').cat.codes

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.astype.html

Sign up to request clarification or add additional context in comments.

2 Comments

Interesting use of category!
This worked well also for every other string column in the dataframe I wanted to convert to int. Thanks a lot for the elegant solution.
3

It's exactly what pd.factorize do:

df['Marital Code'] = pd.factorize(df['Marital Status'])[0] + 1
print(df)

# Output

  Marital Status  Income       Education  Marital Code
0        Married   66613             PhD             1
1        Married   12441       Bachelors             1
2         Single   52842  Masters Degree             2
3   Relationship   78238             PhD             3
4       Divorced   21242     High School             4
5         Single   47183  Masters Degree             2

Comments

2

Another solution, using .map:

df["Marital Status"] = df["Marital Status"].map(
    {"Married": 1, "Single": 2, "Relationship": 3, "Divorced": 4, "Single": 5}
)

print(df)

Prints:

   Marital Status  Income       Education
0               1   66613             PhD
1               1   12441       Bachelors
2               5   52842  Masters Degree
3               3   78238             PhD
4               4   21242     High School
5               5   47183  Masters Degree

1 Comment

map is the best way to control the number
1

Another Map solution for a little more readability and control if you wanted to add more later

df_map = pd.DataFrame({
    'Text' : ['Married', 'Single', 'Relationship'],
    'Int_Conversion' : [1, 2, 3]
})

df['Education'] = df['Marital'].map(df_map.set_index('Text')['Int_Conversion'])

Comments

1

One approach using categories that will work independent of the data:

categories = pd.CategoricalDtype(categories=["Married", "Single", "Relationship", "Divorced"], ordered=True)
df["result"] = df["Marital Status"].astype(categories).cat.codes + 1
print(df)

Output

  Marital Status  Income       Education  result
0        Married   66613             PhD       1
1        Married   12441       Bachelors       1
2         Single   52842  Masters Degree       2
3   Relationship   78238             PhD       3
4       Divorced   21242     High School       4
5         Single   47183  Masters Degree       2

This approach is suggested by the documentation to control the behavior, quote (emphasis mine):

In the examples above where we passed dtype='category', we used the default behavior:

Categories are inferred from the data.

Categories are unordered.

To control those behaviors, instead of passing 'category', use an instance of CategoricalDtype.

Comments

1

By corresponding number. Do you have a specific numbering scheme in mind, or just any number as long as the same string gets the same number assigned?

If the latter, then this code should work.

def replace_words(text):
    next_number = 1
    word_map = {}
    
    def get_number(word):
      nonlocal next_number, word_map
      if word in word_map:
        return word_map[word]
      word_map[word] = next_number
      next_number = next_number + 1
      return next_number - 1
    
    words = text.split(" ")
    replaced_words = [get_number(x) for x in words]
    return " ".join([str(x) for x in replaced_words])
    
print(replace_words("some words some thoughts"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.