1

I have this kind of data from excel

dminerals=pd.read_excel(datafile)
print(dminerals.head(5))

enter image description here

Then I replace the 'Tr' and NaN value using for loop with this script

for key, value in dminerals.iteritems(): 
    dminerals[key] = dminerals[key].replace(to_replace='Tr', value=int(1))
    dminerals[key] = dminerals[key].replace(to_replace=np.nan, value=int(0))

then print it again, it seems working and print the dataframe types.But it shows object data type.

print(dminerals.head(5))
print(dminerals['C'].dtypes)

enter image description here

I tried using this .astype to change one of the column ['C'] to integer but the result is value error

dminerals['C'].astype(int)
ValueError: invalid literal for int() with base 10: 'tr'

I thought I already change the 'Tr' in the dataframe into integer value. Is there anything that I miss in the process above? Please help, thank you in advance!

1 Answer 1

1

You are replacing Tr with 1, however there is a tr that's not being replaced (this is what you ValueError is saying. Remember python is case sensitive. Also, using for loops is extremely inefficient you might want to try using the following lines of code:

dminerales = dminerales.replace({'Tr':1,'tr':1}).fillna(0)

I'm using fillna() which is also better to fill the null values with the specified value 0 in this case, instead of using repalce.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.