How can I use python to combine data from multiple excel files into one excel file?

Question

This is my code so far:

import glob
import pandas as pd
import numpy as np
import openpyxl

log = 'G:\Data\Hotels\hotel.txt'  #text file with my long list of hotels 
file = open(log, 'r')
hotels = []
line = file.readlines()
for a in line:
    hotels.append(a.rstrip('\n'))


for hotel in hotels :
    path = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" - Meetings"
    file = hotel+"_Action_Log.xlsx" 
    print(file)

So all this code has done so far is print the name (string i guess?) of all the hotel files of which i now want to copy and paste the contents into one "Master" excel file. I only require one sheet in each excel file and I don't require the headers (which are placed in row 5 due to fancy formatting in the first 4 rows).

What would my next steps be? I am new to python.

seems you are not combining the files.. here you are printing file names with each hotels. But you goal is to combine data from multiple excel files to one. — Saurabh kukade
– Saurabh kukade, Commented Feb 22, 2018 at 15:32
@Saurabhkukade Oh right, I am new to Python. So how do i do it? — Sorath
– Sorath, Commented Feb 22, 2018 at 15:33

Keith Dowd · Accepted Answer · 2018-02-26 12:21:01Z

Based on your description of your problem, I'm assuming you mean to open and append multiple files together that have the same format and structure (i.e., have the same columns and the columns are in the same order).

In other words, you want to do something like this:

Excel worksheet 1

Col1 Col2
a    b

Excel worksheet 2

Col1 Col2
c    d

Merged (appended) Excel worksheet

Col1 Col2
a    b
c    d

If my assumptions about your problem are true, then you could try the following:

import glob
import pandas as pd
import numpy as np
import openpyxl

# This is your code
log = 'G:\Data\Hotels\hotel.txt'  #text file with my long list of hotels 
file = open(log, 'r')
hotels = []
line = file.readlines()
for a in line:
    hotels.append(a.rstrip('\n'))

# We'll use this list to keep track of all your filepaths
filepaths = []

# I merged your 'path' and 'file' vars into a single variable ('fp')
for hotel in hotels :
    # path = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" - Meetings"
    # file = hotel+"_Action_Log.xlsx"
    fp = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" -Meetings\\"+hotel+"_Action_Log.xlsx"
    # print(file)
    filepaths.append(fp)

# This list stores all of your worksheets (as dataframes)
worksheets = []

# Open all of your Excel worksheets as Pandas dataframes and store them in 'worksheets' to concatenate later
for filepath in filepaths:
    # You may need to adjust the `skiprows` parameter; right now it's set to skip (not read) the first row of each Excel worksheet (typically the header row)
    df = pd.read_excel(filepath, skiprows=1)
    worksheets.append(df)

# Append all worksheets together
append = pd.concat(worksheets)

# Change 'header' to True if you want to write out column headers
append.to_excel('G:\\Data\\Hotels\\merged.xlsx', header=False)

You can learn more about the pd.concat() method here: https://pandas.pydata.org/pandas-docs/stable/merging.html

This certainly makes sense. But is there a way I can loop through all the hotel files first and then append?

Collectives™ on Stack Overflow

How can I use python to combine data from multiple excel files into one excel file?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related