python csv reader + special characters

Question

I am writing script to read a csv file and write the data in a graph using the pygraphml.

Issue is that the file first column has some data like this and I am not able to read them.

Master Muppet ™ joèl b Kýrie, eléison

This is my python script

import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser

#reload(sys)
#sys.setdefaultencoding("utf8")

data = []  # networkd data to write
g = Graph() # graph for networks

#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
    reader = csv.reader(fp)
    unread_count = 2
    completed_list = []

    try:
        for rows in reader:
            if "tweeter_id" == rows[2]:  # skip and check the header
                print("tweeter_id column found")
                continue
            #if rows[2] not in completed_list:                    
            n = g.add_node(rows[2].encode("utf8"))
            completed_list.append(rows[2])
            n['username'] = rows[0].encode("utf8")
            n['userid'] = rows[1]
            if rows[3] != "NULL":   # edges exist only when there is retweets id
                g.add_edge_by_label(rows[2], rows[3])


            print unread_count
            unread_count +=1

    except:
        pass

fp.close()
print unread_count

g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")

Kindly let me know where is the issue.

Thanks in advance.

The data is like this Master Muppet ™ joèl b Kýrie, eléison — Sagar Jha
– Sagar Jha, Commented Sep 25, 2015 at 0:14
I doubt he's getting any errors, thanks to the bare except with a pass for a body... It would help to know if this is Python 2 or Python 3 though; Python 3's native support for Unicode is much better and more seamless, in Python 2, you're going to have a harder time. In addition, we (and Python) need to know the encoding of the file being read; if the file is utf-8, and you read as latin-1, or utf-16, or vice-versa, you won't interpret the file correctly. — ShadowRanger
– ShadowRanger, Commented Sep 25, 2015 at 1:04
The error it stops reading when a specific row with such data appers and stop appending it and execute the rest of the code. — Sagar Jha
– Sagar Jha, Commented Sep 25, 2015 at 1:48

ShadowRanger · Accepted Answer · 2015-09-25 02:09:13Z

1

The Python 2 csv module cannot handle unicode input or input containing NUL bytes (see the note at the top of the module page). Since you're using print as a keyword rather than a function, I'm guessing you're using Python 2. To use csv with Unicode in Python 2, you must convert to UTF-8 encoding.

The csv module's Examples section contains definitions for wrappers (UTF8Recoder, UnicodeReader, UnicodeWriter) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv can process the inputs, then decoding back to Python unicode objects (that represent the text as "pure" Unicode text, not a specific byte encoding).

answered Sep 25, 2015 at 2:09

ShadowRanger

158k12 gold badges221 silver badges316 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python csv reader + special characters

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related