Avoiding UnicodeEncodeError in python

Question

I tried to parse an html table into csv using python with a following script:

from bs4 import BeautifulSoup
import requests
import csv


csvFile = open('log.csv', 'w', newline='')
writer = csv.writer(csvFile)
def parse():
    html = requests.get('https://en.wikipedia.org/wiki/Comparison_of_text_editors')
    bs = BeautifulSoup(html.text, 'lxml')
    table = bs.select_one('table.wikitable')
    rows = table.select('tr')
    for row in rows:
        csvRow = []
        for cell in row.findAll(['th', 'td']):
            csvRow.append(cell.getText())
        writer.writerow(csvRow)
        print(csvRow)


parse()
csvFile.close()

This code outputed a clear formated CSV file with no encoding issues. All was just fine before Enrico Tröger's Geany. My script was unable to write ö into a csv file, so i tried this: csvRow.append(cell.text.encode('ascii', 'replace')) instead of that: csvRow.append(cell.getText()) All was fine, despite the fact that each table cell was nested in b''. So, how can i get a clear formated csv file withous encoding issues(like in the first screenshot) and replaced or ignored all non-unicode symbols(like in the second screenshot) using my scipt?

Can you add the full error traceback with the UnicodeDecodeError to the question? — nosklo
– nosklo, Commented Jul 13, 2018 at 15:27

Igor S · Accepted Answer · 2018-07-13 15:35:07Z

6

Change this one:

csvFile = open('log.csv', 'w', newline='')

To this one:

csvFile = open('log.csv', 'w', newline='', encoding='utf8')

csv module documentation:

Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding()). To decode a file using a different encoding, use the encoding argument of open:
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
         print(row)
The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file.

I suppose your system default encoding is not utf8. You can check it like this:

import locale
locale.getpreferredencoding()

Hope it helps!

edited Jul 13, 2018 at 15:35

answered Jul 13, 2018 at 15:16

Igor S

2243 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yurii Over a year ago

That worked, but i needed to replace csvRow.append(cell.text.encode('ascii', 'replace') to csvRow.append(cell.getText())

Andomar · Accepted Answer · 2018-07-13 15:16:13Z

1

Looks like the csv module expects strings, not bytes. So you could de-encode your bytes before passing them:

cell.text.encode('ascii', 'replace').decode('ascii')

answered Jul 13, 2018 at 15:16

Andomar

239k55 gold badges387 silver badges412 bronze badges

Collectives™ on Stack Overflow

Avoiding UnicodeEncodeError in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related