1

I have a data frame with one column denoting range of Ages. The data type of the Age column in shown as string. I am trying to convert string values to numeric for the model to interpret the features.

enter image description here

I tried the following to convert to 'int'.

df.Age = pd.to_numeric(df.Age)

I get the following error:

ValueError: Unable to parse string "0-17" at position 0

I also tried using the 'errors = coerce' parameter but it gave me a different error:

df.Age = pd.to_numeric(df.Age, errors='coerce').astype(int)

Error:

ValueError: Cannot convert non-finite values (NA or inf) to integer

But there are no NA values in any column in my df

2
  • it is because NaN is a float, which cannot be casted to int, you can fill them with 0 and try again Commented May 9, 2019 at 15:14
  • Actually data instead of images, would be nice. Commented May 9, 2019 at 15:29

3 Answers 3

1

Age seems to be a categorical variable, so you should treat it as such. pandas has a neat category dtype which converts your labels to integers under the hood:

df['Age'] = df['Age'].astype('category')

Then you can access the underlying integers usin the cat accessor method

codes = df['Age'].cat.codes # This returns integers

Also you probably want to make Age an ordered categorical variable, for which you can also find a neat recipe in the docs.

from pandas.api.types import CategoricalDtype

age_category = CategoricalDtype([...your labels in order...], ordered=True)

df['Age'] = df['Age'].astype(age_category)

Then you can acces the underlying codes in the same way and be sure that they will reflect the order you entered for your labels.

Sign up to request clarification or add additional context in comments.

Comments

0

At first glance, I would say it is because you are attempting to convert a string that has not only an int in it. Your string is "0-17", which is not an integer. If it had been "17" or "0", the conversion would have worked.

    val = int("0")
    val = int("17")

I have no idea what your to_numeric method is, so I am not sure if I am answering your question.

Comments

0

Why don't you split

a=df["age"].str.split("-", n=2, expand=True)
df['age_from']=a[0].to_frame()
df['age_to']=a[1].to_frame()

Here is what I got at the end!

         date    age
0  2018-04-15  12-20
1  2018-04-15   2-30
2  2018-04-18  5-46+
         date    age age_from age_to
0  2018-04-15  12-20       12     20
1  2018-04-15   2-30        2     30
2  2018-04-18  5-46+        5    46+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.