1

I'm read_excel() to convert an xlsx spreadsheet into a dataframe. It usually works fine, but for this spreadsheet when I import it and try to list all the values with df["Date"] it seems to repeat every date many, many times. Any idea why this is?

    df = pd.read_excel("data.xlsx", sheet_name="Data")
    dates = df['Date']

The result:

    0        2014-12-22
    1        2014-12-22
    2        2014-12-22
    3        2014-12-22
    4        2014-12-22
                ...
    419532   2020-01-10
    419533   2020-01-10
    419534   2020-01-10
    419535   2020-01-10
    419536   2020-01-10

edit: A bit of the spreadsheet:

enter image description here As you can see the dates are repeated, they definitely do not look like that in my spreadsheet. Any idea why? Thanks

8
  • 1
    Please show what happens between the definition of df and unbranded_df Commented Jan 13, 2020 at 3:10
  • Oh sorry thats a typo, I just made an edit Commented Jan 13, 2020 at 3:13
  • 1
    Could you also provide info on the data contained in the spreadsheet? Particularly the N/A values you're trying to control. Commented Jan 13, 2020 at 3:21
  • Yes I'll give you some of it Commented Jan 13, 2020 at 3:41
  • 2
    If you're filtered on CALGARY or some field like that, and clearing the filter shows a bunch of rows with the same date, then yes. If that's the case, you probably want to set df = df[df['city'] == 'CALGARY'] Commented Jan 13, 2020 at 3:49

1 Answer 1

1

pd.read_excel() will read the entire sheet unless you constrain it to a limited range. Excel has a number of functions to alter the display of the sheet without changing what data exists in the sheet (filtering, hiding rows, etc.). So the extra rows are likey data that exists in the sheet but have been hidden from display. abs

Some tell tale signs are if a filter button has the filtered symbol in it, blue row numbers, and hidden rows as seen below.

enter image description here

Check to make sure that you can see all the data so you know what is in the worksheet and not what is just visible.

To get to the same view that you have, you can filter your dataframe with conditions.

df = df[df['city'] == 'CALGARY']

or multiple conditions:

df = df[(df['city'] == 'CALGARY') & (df['utility'] == 'Suncor')]
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! thank you, yes those little icons are indeed there. Cheers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.