1

I have an excel workbook with 4 worksheets with different names. I want to read them into pandas dataframe only if they are called in the variable sheet_names. For example, the entire workbook's sheet names can be ['banana','orange','apple','grape']. Each sheet has 5 columns that I want to read into Python.

import pandas as pd

sheet_names =['grapes','orange'] #sheet_names is what I control... it can contain any number of sheets between 1 to 4.

xlsx = pd.ExcelFile('C:\\Users\\Ken\\Desktop\\Df.xlsx')

df = []

for x in sheet_names:
    df.append(xlsx.parse(sheetname=x,index_col=0,parse_cols='B:F'))

However the code returns a list with len = 2.

The desired output is a dataframe with 10 columns. Any help please?

1 Answer 1

1

Each call of xlsx.parse() returns a DataFrame, which you are appending to the df list. So in your code df is a list of DFs. If you want to merge selected sheets you can use pd.concat() method:

df = pd.concat([xlsx.parse(sheetname=x,index_col=0,parse_cols='B:F') for x in sheet_names],
               axis=1,
               ignore_index=True)

PS you may want to preserver original indexes - in this case change ignore_index=True to ignore_index=False

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.