How to handle unicode language in python

Question

I am writing a Python + Selenium script to scrap Linkedin site.
I read the profile summary using this statement, which works properly:

profileDescription = profile.find_element_by_xpath("div/div[1]").text

My problem is with the non english data coming from the site.
I am writing the data scrapped from the site to an excel using this code:

with open('search.csv', 'ab') as csvfile:
    self.liSearchOutWriter = csv.writer(csvfile, delimiter=',')
    self.liSearchOutWriter.writerow([profileDescription])

Whenever description contains non-english data, it does not display properly in the excel. I read through unicode and utf8 resources, but could not get a grip on it.

Can someone help me understand how I should modify my code in order to display non english data properly?

Which version of python are you using? (And, uh, if you're using python2, can you switch to using python3?) — NightShadeQueen
– NightShadeQueen, Commented Jul 10, 2015 at 16:57

amza · Accepted Answer · 2015-07-10 17:04:15Z

1

In Python 3.X this is supported out of the box:

 import csv
 with open('search.csv', newline='', encoding='utf-8') as csvfile:
     reader = csv.reader(csvfile)
     for row in reader:
         print(row)

If you're in Python 2.X there is a drop-in library for csv that supports unicode: unicode-csv

import unicodecsv
with open('search.csv', newline='', encoding='utf-8') as csvfile:
    unicodecsv.reader(f, encoding='utf-8'

answered Jul 10, 2015 at 17:04

amza

8102 gold badges8 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

cppcoder Over a year ago

I am getting this error after using unicodecsv. UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid ontinuation byte

Collectives™ on Stack Overflow

How to handle unicode language in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related