I know similar questions have been posted before, but i haven't found something working for this case. I hope you can help.
Here is a summary of the issue:
- I'am writing a web scraping code using selenium(for an assignment purpose)
- The code utilizes a for-loop to go from one page to another
- The output of the code is a dataframe from each page number that is imported to excel. (basically a table)
- Dataframes from all the web pages to be captured in one excel sheet only.(not multiple sheets within the excel file)
- Each web page has the same data format (ie. number of columns and column headers are the same, but the row values vary..)
- For info, I'am using pandas as it is helping convert the output from the website to excel
The problem i'm facing is that when the dataframe is exported to excel, it over-writes the data from the previous iteration. hence, when i run the code and scraping is completed, I will only get the data from the last for-loop iteration.
Please advise the line(s) of coding i need to add in order for all the iterations to be captured in the excel sheet, in other words and more specifically, each iteration should export the data to excel starting from the first empty row.
Here is an extract from the code:
for i in range(50, 60):
url= (urlA + str(i)) #this is the url generator, URLA is the main link excluding pagination
driver.get(url)
time.sleep(random.randint(3,7))
text=driver.find_element_by_xpath('/html/body/pre').text
data=pd.DataFrame(eval(text))
export_excel = data.to_excel(xlpath)
for-loop, insidefor-loop append data to this dataframe and save it only once afterfor-loop.