2

I have an excel sheet where column A is filled with Date/Time and Column N is extracting just the year from the date, eg "=YEAR(A2)". I am trying to use some form of python, Openpyxl, Pandas whatever, to be able to read Column N and then fill Column O with the unique years from column O. Right now, my issue is that when I read with Pandas at least I am getting NaN for all of the rows except the header.

This is my python code

import pandas as pd
files       = 'A_data.xlsx'
sheetName = "Sheet1"
# generate path plus files for workbook.
print(files)
df = pd.read_excel(files,usecols='N')
print(df)

And this is my data that I get back from printing the df:

0     Year
1      NaN
2      NaN
3      NaN
4      NaN
...
...
6285   NaN
6286   NaN
6287   NaN
6288   NaN
6289   NaN

[6290 rows x 1 columns]

I tried to copy the formulas with actual data and interestingly that seemed to solve my problem, but that isn't really how I want to go about doing it if I don't have to. Any help would be greatly appreciated.

I think I listed what I tried already, I have posted my code and sample spreadsheet. I have tried to replace formula with number, which seemed to fix it oddly enough. I have also tried to tell Pandas to ignore headers but that did not solve the problem. Instead of using code, I tried to use the Excel formula "Unique", but when I did that upon opening the sheet, Excel complained of issues, those went away when I commented out this one line.

1
  • If you found a solution in the meantime for you question, you can also write an answer to it yourself if you want. SO explicitly encourages that. Commented Oct 15, 2024 at 7:39

1 Answer 1

0

It appears that you omitted the sheet that you want to use. You don't have to enter a worksheet name when the first worksheet is the one you want. It is good practice enter the name.

df = pd.read_excel(files,usecols='N', sheet_name = "Sheet1")

If you want to use a variable for the sheet name

df = pd.read_excel(files,usecols='N', sheet_name = sheetName)
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the help. It is possible that some of my data had embedded @'s from Openpyxl causing Excel to barf. I have since completely refactored my code thus solving the problem. I would show deltas, but it really isn't the same code any more. Greatly appreciate the help. Re the specifying of the WS.... you are correct, however there is only one sheet. I think that the "sheet1" was left over from some openpyxl work I was doing in the test.py file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.