4

I'm able to export a MySQL table into a CSV file via Python csv module but there are no utf-8 characters. (example: ???? chars insted of ąöę).

The table data is in utf-8 format (phpMyAdmin let me see correct data).

I found some information that in Python all data should be decoded in utf-8 and then encoded into CSV in utf-8 via for example unicodewritter (because the native csv module doesn't support Unicode correctly).

I tried a lot but no success.

Question : Is there any example script to export MySQL database in utf-8 to CSV file in utf-8 format in Python?

I use ubuntu 14.04 and there is a problem with mysql.connector so I use MySQLdb with Gord Thompson code :

# -*- coding: utf-8 -*-
import csv
import MySQLdb
from UnicodeSupportForCsv import UnicodeWriter
import sys
reload(sys)  
sys.setdefaultencoding('utf8')
#sys.setdefaultencoding('Cp1252')

conn = MySQLdb.Connection(db='sampledb', host='localhost',           
user='sampleuser', passwd='samplepass')

crsr = conn.cursor()
crsr.execute("SELECT * FROM rfid")
with open(r'test.csv', 'wb') as csvfile:
    uw = UnicodeWriter(
    csvfile, delimiter=',',
    quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in crsr.fetchall():
    uw.writerow([unicode(col) for col in row])

Error still exist : UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 2: invalid continuation byte

2
  • Have you tried using the UnicodeWriter class shown at the very bottom of the documentation page for the csv module? I've used it with Python 2.7 and it worked fine for me. Commented Jan 4, 2016 at 21:49
  • Thanks for quick reply Gord Thompson. I tried UnicodeWritter but with no success with mysql. It seams it needs some function to decode utf8 sql querry before writing into csv. Could you tell me if you are using this class with mysql utf8 DB? Commented Jan 4, 2016 at 21:56

4 Answers 4

3

MySQL is great in converting character sets. But you need to tell it to set up a connection using the correct collation.

On default it returns how it is put into the database. Add the required charset to the connection:

conn = MySQLdb.Connection(db='sampledb', host='localhost',           
user='sampleuser', passwd='samplepass', charset='utf-8', )

Is this helpful?

Sign up to request clarification or add additional context in comments.

Comments

1

This works for me with Python 2.7.5 and MySQL Connector/Python 2.0.4:

# -*- coding: utf-8 -*-
import csv
import mysql.connector
from UnicodeSupportForCsv import UnicodeWriter

conn = mysql.connector.connect(
    host='localhost', port=3307,
    user='root', password='whatever',
    database='mydb')
crsr = conn.cursor()
crsr.execute("SELECT * FROM vocabulary")
with open(r'C:\Users\Gord\Desktop\test.csv', 'wb') as csvfile:
    uw = UnicodeWriter(
        csvfile, delimiter=',',
        quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in crsr.fetchall():
        uw.writerow([unicode(col) for col in row])

The UnicodeWriter class is taken directly from the last example on the documentation page for the csv module, which I stored in a file named "UnicodeSupportForCsv.py":

import csv, codecs, cStringIO

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

1 Comment

I use ubuntu 14.04 and there is a problem with mysql.connector so I use MySQLdb with Your code :
0

Finaly it Works! Thanks to : Gord Thompson and Prikkeldraad. Thanks Guys !

# -*- coding: utf-8 -*-
import csv
import MySQLdb
from UnicodeSupportForCsv import UnicodeWriter
import sys
reload(sys)  
sys.setdefaultencoding('utf8')
#sys.setdefaultencoding('Cp1252')

conn = MySQLdb.Connection(db='testdb', host='localhost', user='testuser', passwd='testpasswd', use_unicode=0,charset='utf8')

crsr = conn.cursor()
crsr.execute("SELECT * FROM rfid")

with open(r'test.csv', 'wb') as csvfile:
    uw = UnicodeWriter(
        csvfile, delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)

    for row in crsr.fetchall():
        uw.writerow([unicode(col) for col in row])

2 Comments

A code block alone does not provide a good answer. Please add explanations (why it solve the issue, where was the mistake, etc...)
@LouisBarranqueiro: Nah, this is a Thanks comment posted by OP. Please check Gord's answer.
0

Try this one ..make easy for you

https://github.com/jdunck/python-unicodecsv

The unicodecsv is a drop-in replacement for Python 2.7's csv module which supports unicode strings without a hassle. Supported versions are python 2.6, 2.7, 3.3, 3.4, 3.5, and pypy 2.4.0.

>>> import unicodecsv as csv
>>> from io import BytesIO
>>> f = BytesIO()
>>> w = csv.writer(f, encoding='utf-8')
>>> _ = w.writerow((u'é', u'ñ'))
>>> _ = f.seek(0)
>>> r = csv.reader(f, encoding='utf-8')
>>> next(r) == [u'é', u'ñ']
True

6 Comments

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review
@Kevin, Now you are happy!?
Hey, thanks for the edit. It definitely improved your answer! Now people can actually start coding right away when reading your post rather than having to go to another website. Thanks!
@MuraliMopuru: Probably, yes. And I didn't downvote. However, now this is being a real answer. Thanks for the edit.
@MuraliMopuru: But however, your edit was just added an example of the module's basic usage. What about add the working code based on OP's question and program? I think that would be more helpful.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.