0

I am currently working on a data frame in pandas named df. One column contains multiple labels (more than 100, to be exact).

I know how to replace values when there are a smaller amount of values.

For instance, in the typical Titanic example:

titanic.Sex.replace({'male': 0,'female': 1}, inplace=True)

Of course, doing so for 100+ values would be extremely time-consuming. I have seen similar questions, but all answers involve typing the data. Is there a faster way to do this?

1 Answer 1

1

I think you're looking for factorize:

df = pd.DataFrame({'col': list('ABCDEBJZACA')})
df['factor'] = df['col'].factorize()[0]

output:

   col  factor
0    A       0
1    B       1
2    D       2
3    C       3
4    E       4
5    B       1
6    J       5
7    Z       6
8    A       0
9    C       3
10   A       0
Sign up to request clarification or add additional context in comments.

2 Comments

Although I have gotten an error, it worked. Thank you! See the error below: <ipython-input-48-5d0ec988b81e>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
@corvusMidnight what was your error?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.