1

This is my code so far:

import glob
import pandas as pd
import numpy as np
import openpyxl

log = 'G:\Data\Hotels\hotel.txt'  #text file with my long list of hotels 
file = open(log, 'r')
hotels = []
line = file.readlines()
for a in line:
    hotels.append(a.rstrip('\n'))


for hotel in hotels :
    path = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" - Meetings"
    file = hotel+"_Action_Log.xlsx" 
    print(file)

So all this code has done so far is print the name (string i guess?) of all the hotel files of which i now want to copy and paste the contents into one "Master" excel file. I only require one sheet in each excel file and I don't require the headers (which are placed in row 5 due to fancy formatting in the first 4 rows).

What would my next steps be? I am new to python.

3
  • seems you are not combining the files.. here you are printing file names with each hotels. But you goal is to combine data from multiple excel files to one. Commented Feb 22, 2018 at 15:32
  • @Saurabhkukade Oh right, I am new to Python. So how do i do it? Commented Feb 22, 2018 at 15:33
  • Hi friend. I updated my answer per your feedback. Commented Feb 22, 2018 at 21:43

1 Answer 1

2

Based on your description of your problem, I'm assuming you mean to open and append multiple files together that have the same format and structure (i.e., have the same columns and the columns are in the same order).

In other words, you want to do something like this:

Excel worksheet 1

Col1 Col2
a    b

Excel worksheet 2

Col1 Col2
c    d

Merged (appended) Excel worksheet

Col1 Col2
a    b
c    d

If my assumptions about your problem are true, then you could try the following:

import glob
import pandas as pd
import numpy as np
import openpyxl

# This is your code
log = 'G:\Data\Hotels\hotel.txt'  #text file with my long list of hotels 
file = open(log, 'r')
hotels = []
line = file.readlines()
for a in line:
    hotels.append(a.rstrip('\n'))

# We'll use this list to keep track of all your filepaths
filepaths = []

# I merged your 'path' and 'file' vars into a single variable ('fp')
for hotel in hotels :
    # path = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" - Meetings"
    # file = hotel+"_Action_Log.xlsx"
    fp = "G:\\Data\\Hotels\\"+hotel+"\\"+hotel+" -Meetings\\"+hotel+"_Action_Log.xlsx"
    # print(file)
    filepaths.append(fp)

# This list stores all of your worksheets (as dataframes)
worksheets = []

# Open all of your Excel worksheets as Pandas dataframes and store them in 'worksheets' to concatenate later
for filepath in filepaths:
    # You may need to adjust the `skiprows` parameter; right now it's set to skip (not read) the first row of each Excel worksheet (typically the header row)
    df = pd.read_excel(filepath, skiprows=1)
    worksheets.append(df)

# Append all worksheets together
append = pd.concat(worksheets)

# Change 'header' to True if you want to write out column headers
append.to_excel('G:\\Data\\Hotels\\merged.xlsx', header=False)

You can learn more about the pd.concat() method here: https://pandas.pydata.org/pandas-docs/stable/merging.html

Sign up to request clarification or add additional context in comments.

1 Comment

This certainly makes sense. But is there a way I can loop through all the hotel files first and then append?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.