2

I am writing a Python + Selenium script to scrap Linkedin site.
I read the profile summary using this statement, which works properly:

profileDescription = profile.find_element_by_xpath("div/div[1]").text  

My problem is with the non english data coming from the site.
I am writing the data scrapped from the site to an excel using this code:

with open('search.csv', 'ab') as csvfile:
    self.liSearchOutWriter = csv.writer(csvfile, delimiter=',')
    self.liSearchOutWriter.writerow([profileDescription]) 

Whenever description contains non-english data, it does not display properly in the excel. I read through unicode and utf8 resources, but could not get a grip on it.

Can someone help me understand how I should modify my code in order to display non english data properly?

3
  • Which version of python are you using? (And, uh, if you're using python2, can you switch to using python3?) Commented Jul 10, 2015 at 16:57
  • I am using Python 2.7 and I cannot use Python 3 Commented Jul 10, 2015 at 17:05
  • Consider opening the file with codecs.open. Commented Jul 10, 2015 at 17:08

1 Answer 1

1

In Python 3.X this is supported out of the box:

 import csv
 with open('search.csv', newline='', encoding='utf-8') as csvfile:
     reader = csv.reader(csvfile)
     for row in reader:
         print(row)

If you're in Python 2.X there is a drop-in library for csv that supports unicode: unicode-csv

import unicodecsv
with open('search.csv', newline='', encoding='utf-8') as csvfile:
    unicodecsv.reader(f, encoding='utf-8'
Sign up to request clarification or add additional context in comments.

1 Comment

I am getting this error after using unicodecsv. UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid ontinuation byte

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.