How to write to an existing excel file without over-writing existing data using pandas

Question

I know similar questions have been posted before, but i haven't found something working for this case. I hope you can help.

Here is a summary of the issue:

I'am writing a web scraping code using selenium(for an assignment purpose)
The code utilizes a for-loop to go from one page to another
The output of the code is a dataframe from each page number that is imported to excel. (basically a table)
Dataframes from all the web pages to be captured in one excel sheet only.(not multiple sheets within the excel file)
Each web page has the same data format (ie. number of columns and column headers are the same, but the row values vary..)
For info, I'am using pandas as it is helping convert the output from the website to excel

The problem i'm facing is that when the dataframe is exported to excel, it over-writes the data from the previous iteration. hence, when i run the code and scraping is completed, I will only get the data from the last for-loop iteration.

Please advise the line(s) of coding i need to add in order for all the iterations to be captured in the excel sheet, in other words and more specifically, each iteration should export the data to excel starting from the first empty row.

Here is an extract from the code:

for i in range(50, 60):  
    url= (urlA + str(i)) #this is the url generator, URLA is the main link excluding pagination

    driver.get(url)

    time.sleep(random.randint(3,7))

    text=driver.find_element_by_xpath('/html/body/pre').text

    data=pd.DataFrame(eval(text))

    export_excel = data.to_excel(xlpath)

create only one dataframe before for-loop, inside for-loop append data to this dataframe and save it only once after for-loop. — furas
– furas, Commented Oct 8, 2019 at 23:58

furas · Accepted Answer · 2019-10-09 15:05:52Z

1

Thanks Dijkgraaf. Your proposal worked.

Here is the full code for others (for future reference).

apologies for the font, couldnt set it properly. anyway hope below is to some use for someone in the future.

xlpath= "c:/projects/excelfile.xlsx"

df=pd.DataFrame() #creating a data frame before the for loop. (dataframe is empty before the for loop starts)

Url= www.your website.com 

for i in irange(1,10): 

       url= (urlA + str(i)) #this is url generator for pagination (to loop thru the page) 

       driver.get(url)  

       text=driver.find_element_by_xpath('/html/body/pre').text # gets text from site

       data=pd.DataFrame(eval(text)) #evalues the extracted text from site and converts to Pandas dataframe 

       df=df.append(data) #appends the dataframe (df) specificed before the for-loop and adds the new (data)

export_excel = df.to_excel(xlpath)  #exports consolidated dataframes (df) to excel

edited Oct 9, 2019 at 15:05

furas

149k12 gold badges121 silver badges171 bronze badges

answered Oct 9, 2019 at 13:51

The Oracle

5063 gold badges12 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to write to an existing excel file without over-writing existing data using pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related