0

I am trying to extract and combine selected columns from 19 Excel files into single excel file. Am able to extract required columns from single file with below code.

import pandas as pd
import openpyxl

file = pd.read_excel("Shift Handover To A - 05-25-2021.xlsx", "25th May")

dataframe=pd.DataFrame(file[["S No", "Issue Reported By", "Shift", "Severity", "ServiceDesk Ticket #", "Issue Description", "Issue Type", "System Component", "Server Type", "Date and Time of the occurrence", "DT Observed", "Action Taken", "Worked By", "DT Action Taken", "Date and Time Resolution", "Current Stus"]])

# selecting rows based on condition
rslt_df = dataframe.loc[dataframe['Current Stus'] == 'In-Progress' ]

rslt_df.to_excel('output.xlsx')

Am trying to apply it for all files with below code,

import os
import pandas as pd
cwd = os.path.abspath('')
import openpyxl
files = os.listdir(cwd)

for file in files:
    if file.startswith('Shift'):
        file = pd.read_excel(os.path.join(cwd, file))
dataframe=pd.DataFrame(file[["S No", "Issue Reported By", "Shift", "Severity", "ServiceDesk Ticket #", "Issue Description", "Issue Type", "System Component", "Server Type", "Date and Time of the occurrence", "DT Observed", "Action Taken", "Worked By", "DT Action Taken", "Date and Time Resolution", "Current Stus"]])

# selecting rows based on condition
rslt_df = dataframe.loc[dataframe['Current Stus'] == 'In-Progress' ]

#print(rslt_df)
rslt_df.to_excel('output.xlsx')

But am receiving TypeError for dataframe=pd.DataFrame(file..... "TypeError: string indices must be integers" What could be wrong?

2
  • read_excel itself will produce a dataframe, no need to convert it to a df again Commented Jun 19, 2021 at 6:43
  • You use 'file' both as iterator (for file in files) and as dataframe inside the loop. Use another name instead Commented Jun 19, 2021 at 6:46

2 Answers 2

1

You can try amend your codes as follows:

You need to define an empty dataframe and accumulate the results from each loop iteration by .append():

No need to call for pd.DataFrame after the loop, you can just select the columns you want and assign it back by dataframe = dataframe[["S No", ...]]

files = os.listdir(cwd)

dataframe = pd.DataFrame()
for file in files:
    if file.startswith('Shift'):
        file_read = pd.read_excel(os.path.join(cwd, file))
        dataframe = dataframe.append(file_read) 

dataframe = dataframe[["S No", "Issue Reported By", "Shift", "Severity", "ServiceDesk Ticket #", "Issue Description", "Issue Type", "System Component", "Server Type", "Date and Time of the occurrence", "DT Observed", "Action Taken", "Worked By", "DT Action Taken", "Date and Time Resolution", "Current Stus"]]

# selecting rows based on condition
rslt_df = dataframe.loc[dataframe['Current Stus'] == 'In-Progress' ]

#print(rslt_df)
rslt_df.to_excel('output.xlsx')
Sign up to request clarification or add additional context in comments.

Comments

1

The problem with your code is in these lines:

for file in files:
    if file.startswith('Shift'):
        file = pd.read_excel(os.path.join(cwd, file))
dataframe=pd.DataFrame(file[["S No", ... "Current Stus"]])

You use 'file' as iterator (for file in files). When the loop ends, If file.startswith('Shift') is not True, then file is a string, therefore file[["S No", ... "Current Stus"]] will throw an error.

Just use another name for the dataframe

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.