I tried to parse an html table into csv using python with a following script:
from bs4 import BeautifulSoup
import requests
import csv
csvFile = open('log.csv', 'w', newline='')
writer = csv.writer(csvFile)
def parse():
html = requests.get('https://en.wikipedia.org/wiki/Comparison_of_text_editors')
bs = BeautifulSoup(html.text, 'lxml')
table = bs.select_one('table.wikitable')
rows = table.select('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['th', 'td']):
csvRow.append(cell.getText())
writer.writerow(csvRow)
print(csvRow)
parse()
csvFile.close()
This code outputed a clear formated CSV file with no encoding issues.
All was just fine before Enrico Tröger's Geany. My script was unable to write ö
into a csv file, so i tried this:
csvRow.append(cell.text.encode('ascii', 'replace')) instead of that: csvRow.append(cell.getText())
All was fine, despite the fact that each table cell was nested in b''.
So, how can i get a clear formated csv file withous encoding issues(like in the first screenshot) and replaced or ignored all
non-unicode symbols(like in the second screenshot) using my scipt?
UnicodeDecodeErrorto the question?