1

Update: so I'm messing around with the two files, I copied the data from DOB column from the 2nd file to the 1st file to make the files look visually identical. However I notice some really interesting behaviour when using Ctrl+F in Microsoft Excel. When I leave the search box blank in the first file it finds no matches. However when I repeat the same operation with the 2nd file it finds 21 matches from each cell between E1 to G7. I suppose somehow there are some blank/invisible cells in the 2nd file - and that's what's causing read_excel to behave differently.

My goal is to simply execute the pandas read_excel function. However, I'm running into a very strange situation in which I am trying to run pandas.read_excel on two very similar excel files but am getting substantially different results.

Code

import pandas

data1 = pandas.read_excel(r'D:\User Files\Downloads\Programming\stackoverflow\test_1.xlsx')
print(data1)
data2 = pandas.read_excel(r'D:\User Files\Downloads\Programming\stackoverflow\test_2.xlsx')
print(data2)

Output

        Name        DOB Class Year  GPA
0  Redeye438 2008-09-22      Fresh    1
1  Redeye439 2009-09-20       Soph    2
2  Redeye440 2010-09-22     Junior    3
3  Redeye441 2011-09-20     Senior    4
4  Redeye442 2008-09-20      Fresh    4
5  Redeye443 2009-09-22       Soph    3
                             Name  DOB  Class Year  GPA
Redeye438 2011-09-20 Fresh      1  NaN         NaN  NaN
Redeye439 2010-09-22 Soph       2  NaN         NaN  NaN
Redeye440 2009-09-20 Junior     3  NaN         NaN  NaN
Redeye441 2008-09-22 Senior     4  NaN         NaN  NaN
Redeye442 2011-09-22 Fresh      4  NaN         NaN  NaN
Redeye443 2010-09-20 Soph       3  NaN         NaN  NaN

Why are the columns mapped incorrectly for data2?

The excel files in question (the only difference is the data in the DOB column): Excel file downloads Screenshot of test_1.xlsx Screenshot of test_2.xlsx

4
  • 1
    try to instal openpyxl and xlrd, these are pandas dependencies helping to read excel files Commented Sep 21, 2022 at 18:55
  • Thanks for the suggestion @NoobVB but this didn't work. I think it's something with the excel files. Commented Sep 21, 2022 at 19:00
  • 1
    no, they work fine on my pc Commented Sep 21, 2022 at 19:00
  • I installed/updated the dependencies incorrectly. You were correct, thank you. Commented Sep 21, 2022 at 19:29

1 Answer 1

1

It looks like you're using an older version of Pandas because I can't reproduce the issue on the latest version.

import pandas as pd 

pd.show_versions()

INSTALLED VERSIONS
------------------
python           : 3.10.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
pandas           : 1.5.0

As you can see below, the columns of data2 are mapped correctly.

import pandas as pd

data1 = pd.read_excel(r"C:\Users\abokey\Downloads\test_1.xlsx")
print(data1)
data2 = pd.read_excel(r"C:\Users\abokey\Downloads\test_2.xlsx")
print(data2)

        Name        DOB Class Year  GPA
0  Redeye438 2008-09-22      Fresh    1
1  Redeye439 2009-09-20       Soph    2
2  Redeye440 2010-09-22     Junior    3
3  Redeye441 2011-09-20     Senior    4
4  Redeye442 2008-09-20      Fresh    4
5  Redeye443 2009-09-22       Soph    3
        Name        DOB Class Year  GPA
0  Redeye438 2011-09-20      Fresh    1
1  Redeye439 2010-09-22       Soph    2
2  Redeye440 2009-09-20     Junior    3
3  Redeye441 2008-09-22     Senior    4
4  Redeye442 2011-09-22      Fresh    4
5  Redeye443 2010-09-20       Soph    3

However, you're right because the format of the two Excel files are not the same. In fact, when looking at test2.xlsx more carefully, this one seems to carry 7x3 blank cells (rows/cols). The latest version of pandas seems to handle this kind of Excel file since the empty cells are ignored when calling pandas.read_excel.

So in principle, upgrading your pandas version should fix the problem :

pip install --upgrade pandas

If the problem persists, clean your Excel file test2.xlsx by doing this :

  1. Open up the file in Excel
  2. Click on Find & Select
  3. Choose Go To Special and Blanks
  4. Click on Clear All
  5. Save your changes

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @abokey, upgrading pandas from version 1.2.1 to 1.5.0 solved the issue. Additionally removing the blank cells was a good second solution. Thanks very much for your thorough help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.