Using pandas Combining/merging 2 different Excel files/sheets

Question

I am trying to combine 2 different Excel files. (thanks to the post Import multiple excel files into python pandas and concatenate them into one dataframe)

The one I work out so far is:

import os
import pandas as pd

df = pd.DataFrame()

for f in ['c:\\file1.xls', 'c:\\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

df.to_excel("c:\\all.xls")

Here is how they look like.

enter image description here

However I want to:

Exclude the last rows of each file (i.e. row4 and row5 in File1.xls; row7 and row8 in File2.xls).
Add a column (or overwrite Column A) to indicate where the data from.

For example:

enter image description here

Is it possible? Thanks.

behzad.nouri · Accepted Answer · 2014-08-20 11:18:05Z

15

For num. 1, you can specify skip_footer as explained here; or, alternatively, do

data = data.iloc[:-2]

once your read the data.

For num. 2, you may do:

from os.path import basename
data.index = [basename(f)] * len(data)

Also, perhaps would be better to put all the data-frames in a list and then concat them at the end; something like:

df = []
for f in ['c:\\file1.xls', 'c:\\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1').iloc[:-2]
    data.index = [os.path.basename(f)] * len(data)
    df.append(data)

df = pd.concat(df)

edited Aug 20, 2014 at 11:18

answered Aug 20, 2014 at 11:04

behzad.nouri

78.5k18 gold badges130 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mark K Over a year ago

Magnificent, I have to say. behzad.nouri, you are gorgeous!

Amit Sharma · Accepted Answer · 2015-10-15 06:21:31Z

import os
import os.path
import xlrd
import xlsxwriter

file_name = input("Decide the destination file name in DOUBLE QUOTES: ")
merged_file_name = file_name + ".xlsx"
dest_book = xlsxwriter.Workbook(merged_file_name)
dest_sheet_1 = dest_book.add_worksheet()
dest_row = 1
temp = 0
path = input("Enter the path in DOUBLE QUOTES: ")
for root,dirs,files in os.walk(path):
    files = [ _ for _ in files if _.endswith('.xlsx') ]
    for xlsfile in files:
        print ("File in mentioned folder is: " + xlsfile)
        temp_book = xlrd.open_workbook(os.path.join(root,xlsfile))
        temp_sheet = temp_book.sheet_by_index(0)
        if temp == 0:
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(0, col_index)
                dest_sheet_1.write(0, col_index, str)
            temp = temp + 1
        for row_index in range(1, temp_sheet.nrows):
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(row_index, col_index)
                dest_sheet_1.write(dest_row, col_index, str)
            dest_row = dest_row + 1
dest_book.close()
book = xlrd.open_workbook(merged_file_name)
sheet = book.sheet_by_index(0)
print "number of rows in destination file are: ", sheet.nrows
print "number of columns in destination file are: ", sheet.ncols

ZygD · Accepted Answer · 2021-09-23 14:26:32Z

0

Change

df.to_excel("c:\\all.xls")

to

df.to_excel("c:\\all.xls", index=False)

You may need to play around with the double quotes, but I think that will work.

edited Sep 23, 2021 at 14:26

ZygD

24.8k41 gold badges106 silver badges144 bronze badges

answered Sep 23, 2021 at 14:25

user16985645

1

Collectives™ on Stack Overflow

Using pandas Combining/merging 2 different Excel files/sheets

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related