Using python pandas to read excel column that has a formula extracting year from date and getting NaN for all rows but Header

Question

I have an excel sheet where column A is filled with Date/Time and Column N is extracting just the year from the date, eg "=YEAR(A2)". I am trying to use some form of python, Openpyxl, Pandas whatever, to be able to read Column N and then fill Column O with the unique years from column O. Right now, my issue is that when I read with Pandas at least I am getting NaN for all of the rows except the header.

This is my python code

import pandas as pd
files       = 'A_data.xlsx'
sheetName = "Sheet1"
# generate path plus files for workbook.
print(files)
df = pd.read_excel(files,usecols='N')
print(df)

And this is my data that I get back from printing the df:

0     Year
1      NaN
2      NaN
3      NaN
4      NaN
...
...
6285   NaN
6286   NaN
6287   NaN
6288   NaN
6289   NaN

[6290 rows x 1 columns]

I tried to copy the formulas with actual data and interestingly that seemed to solve my problem, but that isn't really how I want to go about doing it if I don't have to. Any help would be greatly appreciated.

I think I listed what I tried already, I have posted my code and sample spreadsheet. I have tried to replace formula with number, which seemed to fix it oddly enough. I have also tried to tell Pandas to ignore headers but that did not solve the problem. Instead of using code, I tried to use the Excel formula "Unique", but when I did that upon opening the sheet, Excel complained of issues, those went away when I commented out this one line.

If you found a solution in the meantime for you question, you can also write an answer to it yourself if you want. SO explicitly encourages that. — nick
– nick, Commented Oct 15, 2024 at 7:39

Shane S · Accepted Answer · 2024-10-14 22:03:29Z

0

It appears that you omitted the sheet that you want to use. You don't have to enter a worksheet name when the first worksheet is the one you want. It is good practice enter the name.

df = pd.read_excel(files,usecols='N', sheet_name = "Sheet1")

If you want to use a variable for the sheet name

df = pd.read_excel(files,usecols='N', sheet_name = sheetName)

edited Oct 14, 2024 at 22:03

answered Oct 14, 2024 at 21:57

Shane S

2,3633 gold badges25 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Asmodeuss Over a year ago

thanks for the help. It is possible that some of my data had embedded @'s from Openpyxl causing Excel to barf. I have since completely refactored my code thus solving the problem. I would show deltas, but it really isn't the same code any more. Greatly appreciate the help. Re the specifying of the WS.... you are correct, however there is only one sheet. I think that the "sheet1" was left over from some openpyxl work I was doing in the test.py file.

Collectives™ on Stack Overflow

Using python pandas to read excel column that has a formula extracting year from date and getting NaN for all rows but Header

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related