Combining data from multiple excel files in python using pandas package

Question

I'm trying to combine excel data files with different dates to one file so I can do analysis using pandas package. I am having difficulties since the files are named by dates and have multiple sheets inside.

This is for an assignment to analyze the date and plot various parameters i.e, temp, atm, GHI e.t.c to the number of days/hours/minutes

import pandas as pd
import glob

all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

Chris Adams · Accepted Answer · 2019-04-17 15:15:54Z

2

Append each file DataFrame to a list, then use pandas.concat to combine them all to one DataFrame:

import pandas as pd
import glob

frames = []

for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.read_excel(f).assign(file_name=f)
    # Add date column for sorting later
    df['date'] = pd.to_datetime(df.file_name.str.extract(r'(\d{4}-\d{2}-\d{2})', expand=False), errors='coerce')
    frames.append(df)

all_data = pd.concat(frames, ignore_index=True).sort_values('date')

edited Apr 17, 2019 at 15:15

answered Apr 17, 2019 at 12:43

Chris Adams

18.7k4 gold badges26 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Tonikami04 Over a year ago

Is it possible to know how that the files were added according to their dates and how maybe using pandas to only start at line 17 using header commands for all files?

Chris Adams Over a year ago

I've updated my answer, this will include a column with the file that the dataframe came from

Tonikami04 Over a year ago

Also I realised the files merged are in no chronological order of their dates, and its messing up the data when I converted the generated file to a csv file. Any idea on how to?

Chris Adams Over a year ago

I’m not at my desk just now, but will be able to look into this for you in about 30 minutes or so

Chris Adams Over a year ago

@Tonikami04 apologies for the delay in getting back to you. IIUC, you want to extract the date part from the filename so that you can sort by that date..? I have updated my answer to add a date column, using .str.extract and pd.to_datetime. hope this is what you're looking for.

|

Jeril · Accepted Answer · 2019-04-18 03:54:34Z

2

Can you try the following:

import os
all_data = pd.DataFrame() #Creating an empty dataframe
for f in glob.glob("/home/humblefool/Dropbox/MSc/MSc Project/Data-Concentrated Solar Power-NamPower/Arandis 2016/2016 01 January/*.xlsx"): #path to datafiles and using glob to select all files with .xlsx extension
    df = pd.ExcelFile(f).parse('Sheet1', skiprows=16)
    file_date = os.path.splitext(os.path.basename(f))[0].split('_')[1]
    df['file_date'] = pd.to_datetime(file_date)
    all_data = pd.concat([all_data, df])
all_data  = all_data.set_index('file_date').sort_index()

edited Apr 18, 2019 at 3:54

answered Apr 17, 2019 at 12:43

Jeril

8,6316 gold badges60 silver badges74 bronze badges

3 Comments

Tonikami04 Over a year ago

This is actually working. But how sure I am that the files are combined together as per their dates?

Jeril Over a year ago

I have revised the solution to skip the first 16 rows. you can check now.

Tonikami04 Over a year ago

Also I realised the files merged are in no chronological order of their dates, and its messing up the data when I converted the generated file to a csv file. Any idea on how to?

Collectives™ on Stack Overflow

Combining data from multiple excel files in python using pandas package

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related