Getting text with accented characters using Python and Selenium

Question

I made a scraping script with python and selenium. It scrapes data from a Spanish language website:

for i, line in enumerate(browser.find_elements_by_xpath(xpath)):
    tds = line.find_elements_by_tag_name('td')  # takes <td> tags from line
    print tds[0].text  # FIRST PRINT
    if len(tds)%2 == 0:  # takes data from lines with even quantity of cells only
        data.append([u"".join(tds[0].text), u"".join(tds[1].text), ])
    print data  # SECOND PRINT

The first print statement gives me a normal Spanish string. But the second print gives me a string like this: "Data de Distribui\u00e7\u00e3o". What's the reason for this?

could you show the original string, and the data in tds please? — tglaria
– tglaria, Commented Dec 2, 2015 at 13:26

Josi · Accepted Answer · 2015-12-02 11:25:38Z

3

You are mixing encodings:

u'' # unicode string
b'' # bytearray string

The text property of tds[0] is a bytearray string which is encoding agnostic, and you are operating in the second print with unicode string, thus mixing the encodings

answered Dec 2, 2015 at 11:25

Josi

964 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Akash zawar · Accepted Answer · 2021-07-27 10:15:11Z

0

for using any type of accented character we have to first encode or decode it before using them

accent_char = "ôâ"
name = accent_char.decode('utf-8')
print(name)

The above code will work for decoding the characters

answered Jul 27, 2021 at 10:15

Akash zawar

516 bronze badges

Collectives™ on Stack Overflow

Getting text with accented characters using Python and Selenium

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related