3

I am trying to import a dataframe (df_model) from an excel file. The first column of this dataframe in excel file has integers 1,2,3,4,5 and I want to read them as integers instead of decimal or float values. But whenever, I try reading them through pandas, it converts the values in first column as decimal like 1.0,2.0,3.0,4.0,5.0. The values in rest of the columns however remain the way I want. Here is the dataframe that pandas read.

    Std S_Ultra S_Classic  ... SMV34_Ultra SMV34_Classic SMV34_Ultra for Flow
0    1.0      1A        1A  ...         1.0           1.0                  2.0
1    2.0      2A        2A  ...         2.0           2.0               2 SP=5
2    3.0      3A        3A  ...      2 SP=5        2 SP=5                  3.0
3    4.0      4A        4A  ...         3.0           3.0               3 SP=5
4    5.0      5A        5A  ...      3 SP=5        3 SP=5                  NaN
..   ...     ...       ...  ...         ...           ...                  ...
100  NaN     NaN       NaN  ...         NaN           NaN                  NaN

Is it possible that pandas doesnt convert the first column to decimal values by default?

0

2 Answers 2

3

Yes, you can specify the type of the column while reading using pandas read_csv

df = pd.read_csv('filename.csv', dtype={'Std': 'Int32'})

And pandas will set the missing values as <NA>

EDIT : As discussed in the comments, the name of the columns are not known before hand, however what is known here is that first column or nth column will contain int, float, string data

While reading the data we can specify the column number and the data type. The column will be read in the datatype you specify. We will skip the header row and will read that separately and assign the header later.

0 is the first column number here

df = pd.read_csv(r'filename.csv', skiprows = 1,  dtype={'0': 'int'}, header = None)
headers = pd.read_csv(r"filename.csv", nrows=0).columns
df.columns = headers

The above code will give you the expected output

EDIT2 : Its not possible to know before hand without doing a one pass over the csv to check which columns are integer, float and string. You need to have this information beforehand if you don't want pandas to read a int column as object data type. And lets say if at all you are doing one pass to get this information, why not convert the columns after reading only. Either way you will have to either do one pass or need to know what all column numbers are going to contain what data type.

Sign up to request clarification or add additional context in comments.

Comments

0

With pandas read_excel() or read_csv() function, you can provide it the 'dtype' param, where you can specify the type you want any column to have, for example:

In your case, you can add that param like this:

df_model= pd.read_excel('filename.xlsx', dtype={'Std': int})

5 Comments

Hi Geisson, I understand this, but can you please let me know how to avoid this in future by using a general column name in the modification that you have suggested. I mean is there a way to just say that the 1st column of the data frame will be read as int instead of saying 'Std'. JUST TO KEEP IN GENERAL. BECAUSE IN CASE SOMEONE CHANGES THE FIRST COLUMN NAME, YOUR MODIFICATION STILL REMAINS VALID.
Hi @MuhammadFarzanBashir in that case you can read the data with header = False, and then specify the column number in dtype. Let me know if you need the code for this
Hi @HimanshuPoddar, buddy can you please provide the code when we want that all the columns bearing numeric only values, show integer only values (as done by your modification for Std column). This is bcz my dataframe is a dynamic one in which I am not sure which columns in future will have numeric values only. so I want something which can convert all numeric only columns to int type
Hi @MuhammadFarzanBashir can you try my updated answer and let me know if it works for you
@MuhammadFarzanBashir the fact that you need to read all numeric data to be read as numeric and smae for other types cannot be known without doing a one pass of your data by actually going through the data and getting to know the data type of your column and then again reading the dataframe with the figured out data type

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.