0

I'm using Pandas to transform some sporting data. One column is the home team stats and the 2nd column is away team stats.

The stats are read from an excel file. When i print a dictionary from the dataframe all of the away team stats are floats (but many should be integers). When I print the type of each column values the first column will show up as Integers and floats while all of the 2nd column consists of numpy.float64 values.

How can I get both columns to be integers and float values?

Here is the python script and output:

import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None  # Remove warning. default='warn'
    
teams_df = pd.read_excel("STATS.xlsm", skiprows=8, nrows=12, usecols=[0,2])  
new_teams_df = teams_df.rename(columns={"Unnamed: 0": "HOME", "Unnamed: 2": "AWAY"})
new_teams_df = new_teams_df.dropna()
       
print("\n********************\n Data Frame as dict \n********************")
print(new_teams_df.to_dict())    
print("\nHome Column Row 1 Type:   " + str(type(new_teams_df.at[1,'HOME'])))
print("Away Column Row 1 Type:   " + str(type(new_teams_df.at[1,'AWAY'])))   
print("\nHome Column Row 10 Type:   " + str(type(new_teams_df.at[10,'HOME'])))
print("Away Column Row 10 Type:   " + str(type(new_teams_df.at[10,'AWAY'])))

Outputs

********************
 Data Frame as dict 
********************
{'HOME': {0: 342, 1: 232, 2: 110, 3: 23, 4: 27, 7: 23, 8: 0.5652, 9: 26.3, 10: 14.9, 11: 44}, 'AWAY': {0: 339.0, 1: 214.0, 2: 125.0, 3: 45.0, 4: 25.0, 7: 18.0, 8: 0.5, 9: 37.7, 10: 18.8, 11: 43.0}}

Home Column Row 1 Type:   <class 'int'>
Away Column Row 1 Type:   <class 'numpy.float64'>

Home Column Row 10 Type:   <class 'float'>
Away Column Row 10 Type:   <class 'numpy.float64'>

Strange issue because the data is coming straight from a stats website onto an Excel file. Both columns should be exactly the same. Is there a away to convert the away column back to python objects. Some rows would need to be floats and the rest integers.

Thanks!

3
  • 1
    What is the problem you are having? All of these types are convertible to the other types. numpy.float64 is just an alias for a Python float. Commented Apr 27, 2021 at 2:59
  • You should still be able to work with these types without having to convert it. If you still want to convert, try df['col_name'] = df['col_name'].astype('float') Commented Apr 27, 2021 at 3:06
  • @TimRoberts The problem is the dataframe has numers that should be ints but are floats. I need to be able to convert certain rows into integers. I can't do this with the numpy.float64 format for some reason. For example trying to convert column "AWAY", row 0 from numpy.float64 to numpy.int64 scoring_df.at[0,'HOME'] = scoring_df.at[0,'HOME'].astype(np.int16) still returns it as a numpy.float6 for some reason Commented Apr 27, 2021 at 4:15

1 Answer 1

1

The issue is that the int data type does not have Nan values by default: Many of the values may be blank for away. Resolution is

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.

Nullable Integer Data Type.

Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array() or Series:

arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)

0      1
1      2
2    NaN
dtype: Int64

For convert column to nullable integers use:

df['myCol'] = df['myCol'].astype('Int64')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.