0

I am writing script to read a csv file and write the data in a graph using the pygraphml.

Issue is that the file first column has some data like this and I am not able to read them.

Master Muppet ™ joèl b Kýrie, eléison

This is my python script

import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser

#reload(sys)
#sys.setdefaultencoding("utf8")

data = []  # networkd data to write
g = Graph() # graph for networks

#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
    reader = csv.reader(fp)
    unread_count = 2
    completed_list = []

    try:
        for rows in reader:
            if "tweeter_id" == rows[2]:  # skip and check the header
                print("tweeter_id column found")
                continue
            #if rows[2] not in completed_list:                    
            n = g.add_node(rows[2].encode("utf8"))
            completed_list.append(rows[2])
            n['username'] = rows[0].encode("utf8")
            n['userid'] = rows[1]
            if rows[3] != "NULL":   # edges exist only when there is retweets id
                g.add_edge_by_label(rows[2], rows[3])


            print unread_count
            unread_count +=1

    except:
        pass

fp.close()
print unread_count

g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")

Kindly let me know where is the issue.

Thanks in advance.

4
  • The data is like this Master Muppet ™ joèl b Kýrie, eléison Commented Sep 25, 2015 at 0:14
  • What's the error you're getting? Commented Sep 25, 2015 at 0:31
  • I doubt he's getting any errors, thanks to the bare except with a pass for a body... It would help to know if this is Python 2 or Python 3 though; Python 3's native support for Unicode is much better and more seamless, in Python 2, you're going to have a harder time. In addition, we (and Python) need to know the encoding of the file being read; if the file is utf-8, and you read as latin-1, or utf-16, or vice-versa, you won't interpret the file correctly. Commented Sep 25, 2015 at 1:04
  • The error it stops reading when a specific row with such data appers and stop appending it and execute the rest of the code. Commented Sep 25, 2015 at 1:48

1 Answer 1

1

The Python 2 csv module cannot handle unicode input or input containing NUL bytes (see the note at the top of the module page). Since you're using print as a keyword rather than a function, I'm guessing you're using Python 2. To use csv with Unicode in Python 2, you must convert to UTF-8 encoding.

The csv module's Examples section contains definitions for wrappers (UTF8Recoder, UnicodeReader, UnicodeWriter) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv can process the inputs, then decoding back to Python unicode objects (that represent the text as "pure" Unicode text, not a specific byte encoding).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.