0

Hi I have a very large excel file that does not have consistent headers on each tab (ie: headers are found in row 18-23). I am trying to write a for loop that will iterate through each tab to set the headers correctly.

I have imported this with pd.read_excel and all data is in a dictionary with tab name = key, and tab data = dataframe. I got here but am not able to get the row number where the condition is true (that will then need to be set as the header).

df_GRID=pd.read_excel(input_GRID, sheet_name=df_GRID_tabs, header=0)

for key,values in df_GRID.items():
    print(key,values.iloc[:, 0] == '8760 Time Series')
1
  • Did one of these answers solve your problem? If not, could you provide more information to help answer it? Commented Oct 21, 2022 at 23:57

2 Answers 2

1

You could use np.where:

print(key, np.where(values.iloc[:,0] == '8760 Time Series')[0][0])
Sign up to request clarification or add additional context in comments.

Comments

0

Maybe this is not that useful to you, but in my case, I use xls2csv.py from HERE. It does a lot of useful things like allow stripping empty lines, etc.

I make a pipeline of XLSX -> CSV -> Edited_CSV -> Processing...

To turn CSV -> Edited_CSV, I just loop through the rows looking for an indication of headers, something like this:

header_seen = False
outlines = []
with open(csvfile, 'r', newline='', encoding='utf-8') as infile:
  reader = csv.reader(infile, delimiter=',')
  for line in list(reader):
    if not header_seen:
      if line[0] == 'PATTERN':
        header_seen = True
        outlines.append(line)
    else:
      outlines.append(line)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.