0

I need to write a program to scrap daily quote from a certain web page and collect them into a single excel file. I wrote something which finds next empty row and starts writing new quotes on it but deletes previous rows too:

wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
.
.
.
z = 1
x = sheet['A{}'.format(z)].value

while x != None:
    x = sheet['A{}'.format(z)].value
    z += 1

writer = pd.ExcelWriter('quote.xlsx')
df.to_excel(writer, sheet_name='Sheet1',na_rep='', float_format=None,columns=['Date', 'Time', 'Price'], header=True,index=False, index_label=None, startrow=z-1, startcol=0, engine=None,merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
writer.save()
3
  • 1
    It sounds like you are compiling a list of strings. Why not have it be a line delimited txt file instead? Each line could be its own entry. Commented Aug 12, 2017 at 17:40
  • I need excel file for further processing. I can write on excel file with this code but while updating each day previous rows will be removed. Commented Aug 13, 2017 at 7:51
  • Sounds similar to when you open a file with write vs append status. Commented Aug 13, 2017 at 16:14

3 Answers 3

1

Question: How to write on existing excel files without losing previous information

openpyxl uses append to write after last used Row:

wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

rowData = ['2017-08-01', '16:31', 1.23]
sheet.append(rowData)

wb.save('gold_quote.xlsx')
Sign up to request clarification or add additional context in comments.

4 Comments

everything looks good unless it cannot append padnas DataFrame format. It says cannot convert to Excel.
The type of ultimate data should be appended is: <class 'pandas.core.frame.DataFrame'>
@Farhad: You have to convert pandas.DataFrame to list and append Row by Row from List or iterate pandas.DataFrame.
The data appended after 13 empty rows . why ?
0
writer.book = wb
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets)

3 Comments

adding this code at ending part does exactly the same thing, should I remove anything from my code in order to keep previous rows while executing each time?
Add it atfer this line: writer = pd.ExcelWriter('quote.xlsx')
while executing this error comes up: xl_format = self.book.add_format() AttributeError: 'Workbook' object has no attribute 'add_format'
0

I figured it out, first we should define a reader to read existing data of excel file then concatenate recently extracted data from web with a defined writer, and we should drop duplicates otherwise any time the program is executed there will be many duplicated data. Then we can write previous and new data altogether:

excel_reader = pd.ExcelFile('gold_quote.xlsx')
to_update = {"Sheet1": df}

excel_writer = pd.ExcelWriter('gold_quote.xlsx')

for sheet in excel_reader.sheet_names:
    sheet_df = excel_reader.parse(sheet)
    append_df = to_update.get(sheet)

    if append_df is not None:
        sheet_df = pd.concat([sheet_df, df]).drop_duplicates()

    sheet_df.to_excel(excel_writer, sheet, index=False)

excel_writer.save()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.