Appending Pandas DataFrame to existing Excel document

Question

Per https://github.com/pandas-dev/pandas/pull/21251/files/09e5b456e1af5cde55f18f903ab90c761643b05a, we should be able to append DataFrames to new XLSX sheets.

Based on the documentation, I tried the following:

>>> import pandas as pd
>>>                
... d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
...                "B":['5', '10', '20']})
>>> d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
...                "B":['1', '2', '3']})
>>> 
>>> # Create XLSX document for ticker
... writer = pd.ExcelWriter('test.xlsx',engine='openpyxl')
>>> d1.to_excel(writer,sheet_name='d1')
>>> writer.save()
>>> 
>>> writer = pd.ExcelWriter('test.xlsx',engine='openpyxl', mode='a')
>>> d2.to_excel(writer,sheet_name='d2')
>>> writer.save()
>>> 
>>> pd.__version__
'0.23.4'     # Just updated this per a comment
>>> 
>>>

The result is a single workbook named 'test.xlsx' with a single tab 'd2'.

How can I prevent the workbook/sheet form being overwritten?

Possible duplicate of How to write to an existing excel file without overwriting data (using pandas)? — Poojan
– Poojan, Commented Jan 14, 2019 at 17:39
Sorry, a bit too quick to mark as duplicate. This was introduced in pandas version 0.23.1 so you need to upgrade. — user3471881
– user3471881, Commented Jan 14, 2019 at 17:41
Thanks, I upgraded per your suggestion. I was using "pip3 install pandas" instead of adding "--upgrade" so I thought I had the latest version. Per your suggestion, I updated but am still getting the same result. — enter_display_name_here
– enter_display_name_here, Commented Jan 14, 2019 at 17:48
Also, the other referenced <stackoverflow.com/questions/20219254/…> is for the function 'df.to_excel()', not 'pd.ExcelWriter'. — enter_display_name_here
– enter_display_name_here, Commented Jan 14, 2019 at 17:51
Are you running this on a REPL? Make sure you restart your shell since the pandas module is loaded in memory with the older version. — r.ook
– r.ook, Commented Jan 14, 2019 at 17:51

It_is_Chris · Accepted Answer · 2019-01-14 19:48:03Z

11

You can use with:

with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
    d1.to_excel(writer,sheet_name='d1')
    d2.to_excel(writer,sheet_name='d2')
    writer.save()

writer.close()

update

This should work just note that the a blank file needs to be created before hand. You can just create a blank file using python if you want. I created a simple loop to, in some ways, mimic the essence of what you are trying to accomplish:

import pandas as pd
from openpyxl import load_workbook

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

dfs = [d1,d2]

for i in range(len(dfs)):
    sheet = 'd'+str(i+1)
    data = dfs[i]
    writer = pd.ExcelWriter('atest.xlsx',engine='openpyxl', mode='a')
    writer.book = load_workbook('atest.xlsx') # here is the difference
    data.to_excel(writer,sheet_name=sheet)
    writer.save()
    writer.close()

or here is the modified first example:

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='w')
d1.to_excel(writer,sheet_name='d1')
writer.save()
writer.close()

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='a')
writer.book = load_workbook('atest.xlsx')
d2.to_excel(writer,sheet_name='d2')
writer.save()
writer.close()

edited Jan 14, 2019 at 19:48

answered Jan 14, 2019 at 17:57

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

enter_display_name_here Over a year ago

I tried this but the result is comparable to xlsxwriter.readthedocs.io/example_pandas_multiple.html. It's a different engine but the workbook is opened only once then both tabs are written. I need to open/write/close then open/write/close. I tried your suggestion by putting d1.to_excel() and d2.to_excel() in separate with loops but the result was a single XLSX with a single tab. I also tried your suggestion but with setting mode='w'. It worked the same as mode='a'.

It_is_Chris Over a year ago

@enter_display_name_here why do you need to open/write/close multiple times?

enter_display_name_here Over a year ago

I am building an extremely large dictionary of DataFrames and would like to write out each DataFrame as a tab as they get completed. I would hate to get 100+ DataFrames in and then the program experiences an issue and nothing is written out. Basically, I am scraping 4 websites, compiling different tables of data into a single DF, then doing that process over and over again. I'd like to generate an individual tab per DF as it goes through the program.

enter_display_name_here Over a year ago

Thanks for taking the time to put that together. Your update is actually the workaround to what was done prior to the addition of mode='a'. If you refer to line 1009 (here: github.com/pandas-dev/pandas/pull/21251/files/…), you'll notice that they added book = load_workbook(self.path) to their code. Your code works the same as with mode='w'. If I am unable to get mode='a' to work, then I will use this. I am hoping that their new code will check for an existing tab and then overwrite that.

Rodney Souza · Accepted Answer · 2023-01-24 16:02:52Z

This worked for me, it creates a file if the file does not exists, and append to the end of the file if it already exists

you may need to install openpyxl

import pandas as pd
from openpyxl import load_workbook

def append_xlsx(df, file = 'results.xlsx'):
    ext = '.xlsx'
    if ext not in file:
        file+=ext
    
    if os.path.exists(file):
        mode="a"
        if_sheet_exists="overlay"
        header = False

        wb = load_workbook(file)
        sheet = wb.worksheets[0]
        startrow = sheet.max_row
    else:
        mode='w'
        if_sheet_exists = None
        header = True
        startrow = 0

    with pd.ExcelWriter(
        file,
        mode=mode,
        engine="openpyxl",
        if_sheet_exists=if_sheet_exists,
    ) as writer:

        df.to_excel(
            writer, 
            sheet_name="Sheet1",
            startrow=startrow,
            header=header,
            index=False,
            encoding='utf8'
        )

Adam Safier · Accepted Answer · 2019-02-28 05:59:56Z

1

import pandas as pd

writer = pd.ExcelWriter(wk_path + save_file)
# ....
# build sc_files DataFrame and save. sc_files includes
# a column called OS.

sc_file.to_excel(writer, sheet_name='test')

# build data frame of OS counts out of sc_file
counts_os = sc_file.OS.value_counts() 

# To append to 'test' sheet, use startcol=x1, startrow=y
# To append counts_OS to the end of the current 'test' sheet
y = len(sc_file)
y += 1
counts_os.to_excel(writer, sheet_name='test', 
    startcol=1, startrow=y)

# write counts_os to sheet test2 
counts_os.to_excel(writer, sheet_name='test2')
writer.save()
writer.close()

edited Feb 28, 2019 at 5:59

answered Feb 28, 2019 at 5:47

Adam Safier

364 bronze badges

1 Comment

Adam Safier Over a year ago

Just re-read and see that I answered how to append to the same tab but the question is about multiple tabs. I have a program similar to yours that writes multiple tabs successfully. I don't specify engine='openpyxl' and I don't do a writer.save() until the very end of the program - last step. Works though I could see limits if my data was too large.

enter_display_name_here · Accepted Answer · 2019-01-14 20:35:50Z

0

I submitted a post on GitHub and received a response from the contributors (see the highlighted portion below). It turns out that this functionality won't be released until 0.24 so it is not available in 0.23.1. FYI - I downloaded the RC and successfully tried out the mode='a'option. However, there may be a bug with workbooks that do not exist; I receive FileNotFoundError: [Errno 2] No such file or directory: 'test.xlsx'.

"this feature is being released as part of 0.24 which we just issued a release candidate for over the past few days. You can try on the RC or here on master and if neither works open an issue per the contributing guide, but this wouldn't be expected to work on versions older than that"

answered Jan 14, 2019 at 20:35

enter_display_name_here

7081 gold badge12 silver badges25 bronze badges

3 Comments

It_is_Chris Over a year ago

did you create the blank test.xlsx file first

enter_display_name_here Over a year ago

Yep, I tried that and it worked. I also just double-checked their code. At first, I thought a new workbook would be created if one doesn't exist, but I think that's incorrect. I think the code creates a new workbook if mode= is not 'a', not if a workbook does not exist.

It_is_Chris Over a year ago

After looking at the code that is correct, a file is not created when mode=='a' they are just using load_workbook, which means the file needs to exist prior to running ExcelWriter with the param mode='a' if mode != a then they are creating a new workbook: self.book = Workbook()

Collectives™ on Stack Overflow

Appending Pandas DataFrame to existing Excel document

4 Answers 4

update

4 Comments

Comments

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

update

4 Comments

Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related