Encoding problem while web scraping Python

Question

Do u know, why am I getting this ID ÐÐ¾ÑÐ ÐµÐµÑÑÑÐ° instead of getting ID ГосРеестра. I know that there is some issue with encoding, because it's cyrillic. Have no idea how to solve it.

Scraping web-page is link

My code is:

dfo_url = "https://opi.dfo.kz/p/ru/DfoObjects/objects/teaser-view/26730?OptionName=ExtraData"
r = requests.get(dfo_url)

tree = html.fromstring(r.content)
tr_elements = tree.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[2]:
    i+=1
    name=t.text_content()

    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

peter123 · Accepted Answer · 2020-04-30 07:46:59Z

2

This may fix it, try to do this right before the print:

name.encode(encoding='UTF-8',errors='strict')

Or try this link.

edited Apr 30, 2020 at 7:46

answered Apr 30, 2020 at 7:26

peter123

19012 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

peter123 Over a year ago

@Dias take a look at this

peter123 Over a year ago

No problem, if you want you can accept my answer, i will update it with the link :) @Dias

Collectives™ on Stack Overflow

Encoding problem while web scraping Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related