Locate dataframe and concatenate based on specific headers in Python

Question

If I have lots of excel files as follows (here are just two examples):

data1.xlsx

data2.xlsx

Is it possible I just take the part with columns of id, a, b, c and ignore the rest and concatenate all those files together into a new excel file in Python. Thanks.

Here is what I have tried:

import os

for root, dirs, files in os.walk(src, topdown=False):
    for file in files:
        if file.endswith('.xlsx') or file.endswith('.xls'):
            #print(os.path.join(root, file))
            try:
                df0 = pd.read_excel(os.path.join(root, file))
                #print(df0)
            except:
                continue
            df1 = pd.DataFrame(columns = [columns_selected])
            df1 = df1.append(df0, ignore_index = True)
            print(df1)
            df1.to_excel('test.xlsx', index = False)

Using iloc to get only the data after specific columns from .xlsx, then concatenate both the sheets to make a new one. and pufff! done. — DirtyBit
– DirtyBit, Commented Jan 24, 2019 at 12:58
Thanks. In fact my real data is literally a mess, quite difficult to deal with. :( — ah bon
– ah bon, Commented Jan 24, 2019 at 14:04
ahbon: You can probably adapt the code in my answer to your other question to do this (as I already told you I thought was likely). Instead of the destination being a single directory somewhere, for this it's single dataframe—and instead of copying files to the destination directory, you'll want to extract and concatenate data from all the files which have one of the wanted file extensions. — martineau
– martineau, Commented Jan 25, 2019 at 11:09
Yeah. I agree. I means if it's possible to concatenate data1.xlsx and data2.xlsx ignore contents before and after empty rows to get final excel with columns of id, a, b, c. — ah bon
– ah bon, Commented Jan 25, 2019 at 11:36

Charles R · Accepted Answer · 2019-01-24 13:01:46Z

1

use skpirows and nrows https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

import pandas

df1 = pd.read_excel('data1.xlsx', skpirows=3, nrows=5)
df2 = pd.read_excel('data2.xlsx', skpirows=4, nrows=5)

dfFinal = df1.append(df2)

edited Jan 24, 2019 at 13:01

answered Jan 24, 2019 at 12:59

Charles R

1,6511 gold badge11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Charles R Over a year ago

My mistake... I delete it

ah bon Over a year ago

Thanks for your help. In fact I have lots of those type of excel files under a folder I don't want real_excel one by one, are there other solutions?

Charles R Over a year ago

if all files have the same pattern, you can have one file with many sheets on it. Then you can set sheet_name=None to read and parse all data.

DirtyBit Over a year ago

@ahbon are those files or sheets within file?

ah bon Over a year ago

Thanks. The problem is they are located in different folders and subfolders, and plus thoese file have sheets as well. Mabey first I need to do is to copy all excel files into one folder via shutil.copytree(), then combine them to one file with multiple sheets, and at last step, take your solution.

|

DirtyBit · Accepted Answer · 2019-01-24 13:14:50Z

1

Extending @Charles R's answer with your requirement of multiple excel files.

# get all the files
os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)

and then:

for File in FileList:
    for x in File:
        # the rest of the code for reading

answered Jan 24, 2019 at 13:14

DirtyBit

16.8k5 gold badges37 silver badges56 bronze badges

2 Comments

ah bon Over a year ago

Thanks. Please check the poster I put here. The structure of folders and files is similar to this one: stackoverflow.com/questions/54346748/…

ah bon Over a year ago

I think we need use os.walk to iterate all excel files end with xlsx or xls, non?

Collectives™ on Stack Overflow

Locate dataframe and concatenate based on specific headers in Python

2 Answers 2

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related