Problem saving HTML table into excel using Python

Question

This is my first time using Python and I am trying the scraping method and putting together codes available on the net and currently I'm stuck on saving the output into an Excel file.

Ok, so first I need to read an email from Outlook and get the data inside. But it's on table format, meaning the creator copy paste a data from Excel as a table so the best method that I found was converting it to an HTML file.

import win32com.client
import xlsxwriter
import pandas as pd
import requests
from bs4 import BeautifulSoup

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
'''message = messages.GetLast()
body_content = message.Body
subject = message.Subject
categories = message.Categories
print(body_content)
print(subject)
print(categories)'''
string = "Monthly PPM Report"
for message in messages:
    if string in message.Subject:
        print(message.HTMLBody)
        Html_file= open("filename.html","w", encoding="utf-8")
        Html_file.write(message.HTMLBody)
        Html_file.close()

So, using the code above I managed to save the email as an HTML file. The next step is to find the table targeting the div class.

rfile  = open('filename.html')
rsoup  = BeautifulSoup(rfile)
nodes1  = rsoup.find('div',{'class':'MsoNormalTable'})

When I tried to print I managed to get the table that I needed but when I tried to save it as an Excel file using nodes1.to_excel('test.xlsx') I get this error.

nodes1.to_excel('test.xlsx') AttributeError: 'NoneType' object has no attribute 'to_excel'

Any suggestions about what step am I missing?

Jan F. · Accepted Answer · 2019-10-08 08:10:57Z

2

To use pandas to_excel() method you first need a pandas DataFrame

assuming nodes1 is a dictionary object:

data_frame = pd.DataFrame(data=nodes1)
data_frame.to_excel('label_name')

edited Oct 8, 2019 at 8:10

answered Oct 8, 2019 at 8:00

Jan F.

1088 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sarotobi Over a year ago

Thanks so the problem is the nodes1 is not an dictionary so i needed to convert before appending.

kelidas · Accepted Answer · 2019-10-08 09:38:32Z

1

You can read the table using pandas function read_html:

import pandas as pd
rfile  = open('filename.html')
html = rfile.read()

# all tables in document
tab_list = pd.read_html(html)
# tables with header
tab_list = pd.read_html(html, header=0)
# table with attributes
tab_list = pd.read_html(html, attrs={'class':'xxx', 'id':'xxx', 'align':"center", 'cellspacing':"1", 'cellpadding':"4", 'border':"0"})

# your nodes1 from BeautifulSoup
tab_list = pd.read_html(str(nodes1))

# save first table
tab_list[0].to_excel('test.xlsx')

answered Oct 8, 2019 at 9:38

kelidas

813 bronze badges

Collectives™ on Stack Overflow

Problem saving HTML table into excel using Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related