2

Is there an alternative to using the csv module to read a csv file in python3 in a streaming way? Currently my data looks something like this:

"field1"::"field2"::"field3"\x02\n
"1"::"hi\n"::"3"\x02\n
"8"::"ok"::"3"\x02\n

The separator is two characters, :: (the csv module only accepts a single character separator) and the line separator also contains two characters, \x02\n. Are there any csvreaders that can be used for python in a streaming mode that would be able to support this?

Here is an example of what I'm trying to do:

>>> import csv
>>> s = ''''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''
>>> csvreader=csv.reader(s, delimiter='::', lineterminator='\x02\n')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: "delimiter" must be a 1-character string

Loading pandas just to read this csv seems like overkill x 100, so I'd like to see what other options there are.

8
  • If you're able to control how this csv is formatted I would switch to a single char and a different line separator but using just open and re should suffice here I believe. Commented Feb 14, 2019 at 4:13
  • Are you saying you would like to have the data separated by the two delimiters within the same process? As well, are you using csv.reader? Could you maybe post the section of code you are currently attempting to use to clean this data? Commented Feb 14, 2019 at 4:13
  • 1
    Here's a related Q/A, but requires pandas--seems like a giant dependency for such a small feature: stackoverflow.com/questions/31194669/… Commented Feb 14, 2019 at 4:15
  • @BrianPeterson agreed -- are there any other options? Commented Feb 14, 2019 at 4:46
  • @Jaba re gets really tricky -- with escape characters, quote characters, etc. I'd rather not try and not do that. Commented Feb 14, 2019 at 4:48

2 Answers 2

1

As you have discovered, the CSV library is not suitable for that data format. You could though pre-parse the data beforehand. For example the following approach should work:

from io import StringIO
import csv

s = '''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''

def csv_reader_alt(source):
    return csv.reader((line.replace('\x02', '').replace('::', ':') for line in source), delimiter=':')    

for row in csv_reader_alt(StringIO(s)):
    if row:
        print(row)

Giving you the following output:

['field1', 'field2', 'field3']
['1', 'hi\n', '3']
['8', 'ok', '3']
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for this. Please see updated question, where reading rows line by line isn't as straightforward.
@DavidL it's a bit difficult to tell the exact format from your small example but I have now shown how you could possibly pre-parse your data before passing it to a normal csv.reader(). Maybe a link to the actual CSV file would help for testing.
0

@MartinEvans shows a nice way of doing it in his answer.

Here is the code for reading from a file (not from a string in memory) with proper file handling, using a custom delimiter (implemented using a custom generator):

def get_line(file, delimiter='\n', bufsize=4096):
    # https://stackoverflow.com/a/19600562/9225671
    buf = ''
    while True:
        chunk = file.read(bufsize)
        if len(chunk) == 0:
            # end of file has been reached; serve the remaining data and exit
            yield buf
            return

        buf += chunk
        line_list = buf.split(delimiter)

        # don't serve the last part yet, first we need to read more chunks from the file
        buf = line_list.pop(-1)

        for line in line_list:
            yield line

if __name__ == '__main__':
    with open('my_file.csv') as f:
        for line in get_line(f, delimiter='\x02\n'):
            if len(line) > 0:
                parts = line.split('::')
                print(parts)
                print([
                    e.strip('"')
                    for e in parts])

Does that work for you?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.