1

Issue

I have an excel file in German format. It looks like this enter image description here

I want to read the first column as numbers into pandas using the flowing code:

import pandas as pd
import numpy as np
tmp = pd.read_excel("test.xlsx", dtype = {"col1": np.float64})

It gives me the error

ValueError: Unable to convert column col1 to type <class 'numpy.float64'>

The issue is in excel. If I modify the col1 manuelly to number format, it solves the issue. See this new excel file: enter image description here

Approach

I can first read col1 as object into pandas, then I need to replace , to ., at the last I can change the string to float.

However

The approach is tedious. How can I solve this problem more efficiently?

1

1 Answer 1

2

Unfortunately, there is no way to tell pandas what decimal separator is being used.

What you could do though is create a function to do the conversion and pass it to read_excel as part of the converters argument.

def fix_decimal(num):
### convert numeric value with comma as decimal separator to float
  print(num)
  return float(num.replace(',', '.')) if num else 0
  
tmp = pd.read_excel("test.xlsx", converters={0: fix_decimal} )
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.