7

Is there a way to parse CSV data in Python when the data is not in a file? I'm storing CSV data in my database and I'd like to parse it. I'm looking for something analogous to Ruby's CSV.parse. I know Python has a CSV class but everything I've seen in the docs seems to deal with files as opposed to in-memory CSV data.

(And it's not an option to parse the data before it goes into the database.)

(And please don't tell me not to store the CSV data in the database. I know what I'm doing as far as the database goes.)

3
  • "I'm storing CSV data in my database and I'd like to parse it." This is ambiguous. Are you storing an entire CSV file as a glob or string in the database? Do you mean that you're storing all the pieces of information in a table in the database, where each column would correspond to a CSV field? Commented Jan 31, 2011 at 20:11
  • I'm storing the entire file as a BLOB. Commented Jan 31, 2011 at 20:13
  • 1
    What's the structure of the BLOB? Do you have the option to pickle the data instead? Commented Jan 31, 2011 at 20:16

4 Answers 4

9

There is no special distinction for files about the python csv module. You can use StringIO to wrap your strings as file-like objects.

Sign up to request clarification or add additional context in comments.

1 Comment

cStringIO is more appropriate in most cases.
3

Here is why you should use cStringIO.StringIO (io.StringIO in Python 3.x) instead of some DIY kludge:

>>> import csv
>>> from cStringIO import StringIO
>>> fromDB = '"Column\nheading1",hdng2\r\n"data1\rfoo","data2\r\nfoo"\r\n'
>>> sources = [StringIO(fromDB), fromDB.splitlines(True),
...     fromDB.splitlines(), fromDB.split("\n")]
>>> for i, source in enumerate(sources):
...     print i, list(csv.reader(source))
...
0 [['Column\nheading1', 'hdng2'], ['data1\rfoo', 'data2\r\nfoo']] # OK
1 [['Column\nheading1', 'hdng2'], ['data1\rfoo', 'data2\r\nfoo']] # OK
2 [['Columnheading1', 'hdng2'], ['data1foo', 'data2foo']]         # 3 errors
3 [['Columnheading1', 'hdng2'], ['data1\rfoo', 'data2\rfoo'], []] # 3 errors
>>>

Using guff.splitlines(True) is not recommended as it has a far greater chance than StringIO(guff) that whoever is reading your code will not have a clue what it does.

Comments

2

http://docs.python.org/library/csv.html

csv.reader(csvfile)

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

If you have e.g. the content from DB in a string you can parse it like

import csv

fromDB = "1,2,3\n4,5,6"

reader = csv.reader(fromDB.split("\n"))
for row in reader:
  print("New row")
  for col in row:
    print("  ", col)

5 Comments

-1 Because your answer is a simple RTFM-Quote from the docs with no further explaination why or how this is helpful. And it doesn't answer the OP question
@Martin Thurau added example (skipping DB part, according to latest comment the file content itself is saved in the database and not single rows)
-1 Try your code with fromDB = '"Column\nheading1",hdng2\r\ndata1,data2\r\n'
@Howard: see my answer for details.
I'd suggest that this is marked as the answer as it is the most likely to work 'out of the box'
1

Use the stringio module, which allows you to dress strings as file-like objects. That way you can pass a stringio "file" to the CSV module for parsing (or any other parser you may be using).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.