1

I commented a closed issue in Pandas github repo:

Treating an empty value in Excel as nan has another side effect: integer will be converted to float. Subsequent operation on that columns will have other effects again.

Also read_excel() does not honor empty value handling offered by function in converters:

I have an Excel file temp.xlsx with the following data:

enter image description here

There are white spaces surrounding values in Key3 column.

Key1,Key2,Key3,Key4
0,11,  Apple  ,1.12
1,12,,1.02
2,13,  Orange,
3,  ,Banana  ,0.01

This is the code:

import numpy as np
import pandas as pd

def handle_string(value):
    return value.replace(' ', '')

def handle_integer(value):
    if value == '':
        return 0
    else:
        int(value)

def handle_float(value):
    if value == '':
        return 0.0
    else:
        float(value)

df = pd.read_excel(
        'temp.xlsx',
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")

print('')

df = pd.read_excel(
        'temp.xlsx',
        converters={\
            'Key1' : handle_integer,
            'Key2' : handle_integer,
            'Key3' : handle_string,
            'Key4' : handle_float,
        }
)
print(df)
print(f"type(df.loc[3,'Key2']) = {type(df.loc[3,'Key2'])}")
print(f"type(df.loc[1,'Key3']) = {type(df.loc[1,'Key3'])}")
print(f"type(df.loc[2,'Key4']) = {type(df.loc[2,'Key4'])}")

The output:

   Key1  Key2        Key3  Key4
0     0  11.0     Apple    1.12
1     1  12.0         NaN  1.02
2     2  13.0    Orange     NaN
3     3   NaN    Banana    0.01
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>

   Key1  Key2    Key3  Key4
0  None   NaN   Apple   NaN
1  None   NaN     NaN   NaN
2  None   NaN  Orange   0.0
3  None   0.0  Banana   NaN
type(df.loc[3,'Key2']) = <class 'numpy.float64'>
type(df.loc[1,'Key3']) = <class 'float'>
type(df.loc[2,'Key4']) = <class 'numpy.float64'>

dtype parameters have lower priority than converters.

1 Answer 1

1

I could be wrong, but it looks to me like the problem is related to your return values for these functions. In two places, you return None without apparently intending to. See below:

def handle_string(value):
    return value.replace(' ', '')

def handle_integer(value):
    if value == '':
        return 0
    else:
        int(value) # Returns none!!!

def handle_float(value):
    if value == '':
        return 0.0
    else:
        float(value) # Returns none!!!
Sign up to request clarification or add additional context in comments.

1 Comment

@yoonghm great! If that addresses your original question please mark the post as answered

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.