Encoding error trying to write file with python

Question

Here is the full script:

import requests
import bs4


res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()

multiline_code = """{}""".format(page_HTML_code)

f = open("testfile.txt","w+")
f.write(multiline_code)
f.close()

So I'm trying to write the entire Downloaded HTML as a file while keeping it neat and clean.

I do understand that it has problems with the text and can't save certain characters, but I'm not sure how to encode the text correctly.

Can anyone help?

This is the error message that I will get

"C:\Location", line 16, in <module>
    f.write(multiline_code)
  File "C:\\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0421' in position 209: character maps to <undefined>

Try open('testfile.txt', 'wb') for writing as a binary file. To read the file you will then need to open it with open('testfile.txt', 'rb'). — Engineero
– Engineero, Commented May 9, 2018 at 16:56
Also, use with open('testfile.txt', 'wb') as a_file: followed by an indented a_file.write(...) instead of using explicit open and close statements. Context managers (the with ... as ...: syntax) are less likely to go wrong. — Engineero
– Engineero, Commented May 9, 2018 at 16:57
You could try encoding with .encode('utf-8'), although I think you might have the same problem. You can also choose to ignore errors with .encode('utf-8', errors='ignore') or one of several other options listed here. — Engineero
– Engineero, Commented May 9, 2018 at 17:16
For instance, I think .encode('utf-8', errors='backslashreplace') may replace the unknown character with the literal string '\u0421', so you wouldn't lose that information, but you may have to do something funky to decode it when you read it back. — Engineero
– Engineero, Commented May 9, 2018 at 17:18
@Engineero thanks for your help. :) Just posted an answer to my own question that did the trick. — TacoCat
– TacoCat, Commented May 9, 2018 at 17:18

TacoCat · Accepted Answer · 2018-05-09 17:17:08Z

1

I did some digging around and this worked:

import requests
import bs4


res = requests.get('https://example.com')

soup = bs4.BeautifulSoup(res.text, 'lxml')

page_HTML_code = soup.prettify()



multiline_code = """{}""".format(page_HTML_code)

#add the Encoding part when opening file and this did the trick
with open('testfile.html', 'w+', encoding='utf-8') as fb:
    fb.write(multiline_code)

answered May 9, 2018 at 17:17

TacoCat

4695 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Encoding error trying to write file with python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related