1

This is my first time using Python and I am trying the scraping method and putting together codes available on the net and currently I'm stuck on saving the output into an Excel file.

Ok, so first I need to read an email from Outlook and get the data inside. But it's on table format, meaning the creator copy paste a data from Excel as a table so the best method that I found was converting it to an HTML file.

import win32com.client
import xlsxwriter
import pandas as pd
import requests
from bs4 import BeautifulSoup

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
'''message = messages.GetLast()
body_content = message.Body
subject = message.Subject
categories = message.Categories
print(body_content)
print(subject)
print(categories)'''
string = "Monthly PPM Report"
for message in messages:
    if string in message.Subject:
        print(message.HTMLBody)
        Html_file= open("filename.html","w", encoding="utf-8")
        Html_file.write(message.HTMLBody)
        Html_file.close()

So, using the code above I managed to save the email as an HTML file. The next step is to find the table targeting the div class.

rfile  = open('filename.html')
rsoup  = BeautifulSoup(rfile)
nodes1  = rsoup.find('div',{'class':'MsoNormalTable'})

When I tried to print I managed to get the table that I needed but when I tried to save it as an Excel file using nodes1.to_excel('test.xlsx') I get this error.

nodes1.to_excel('test.xlsx') AttributeError: 'NoneType' object has no attribute 'to_excel'

Any suggestions about what step am I missing?

2 Answers 2

2

To use pandas to_excel() method you first need a pandas DataFrame

assuming nodes1 is a dictionary object:

data_frame = pd.DataFrame(data=nodes1)
data_frame.to_excel('label_name')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so the problem is the nodes1 is not an dictionary so i needed to convert before appending.
1

You can read the table using pandas function read_html:

import pandas as pd
rfile  = open('filename.html')
html = rfile.read()

# all tables in document
tab_list = pd.read_html(html)
# tables with header
tab_list = pd.read_html(html, header=0)
# table with attributes
tab_list = pd.read_html(html, attrs={'class':'xxx', 'id':'xxx', 'align':"center", 'cellspacing':"1", 'cellpadding':"4", 'border':"0"})

# your nodes1 from BeautifulSoup
tab_list = pd.read_html(str(nodes1))

# save first table
tab_list[0].to_excel('test.xlsx')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.