CSV reader behavior with None and empty string

Question

I'd like to distinguish between None and empty strings ('') when going back and forth between Python data structure and csv representation using Python's csv module.

My issue is that when I run:

import csv, cStringIO

data = [['NULL/None value',None],
        ['empty string','']]

f = cStringIO.StringIO()
csv.writer(f).writerows(data)

f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in csv.reader(f)]

print "input : ", data
print "output: ", data2

I get the following output:

input :  [['NULL/None value', None], ['empty string', '']]
output:  [['NULL/None value', ''], ['empty string', '']]

Of course, I could play with data and data2 to distinguish None and empty strings with things like:

data = [d if d!=None else 'None' for d in data]
data2 = [d if d!='None' else None for d in data2]

But that would partly defeat my interest of the csv module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists).

Is there a csv.Dialect or parameters to csv.writer and csv.reader that would enable them to distinguish between '' and None in this use-case?

If not, would there be an interest in implementing a patch to csv.writer to enable this kind of back and forth? (Possibly a Dialect.None_translate_to parameter defaulting to '' to ensure backward compatibility.)

BrenBarn · Accepted Answer · 2012-07-07 23:41:21Z

21

The documentation suggests that what you want is not possible:

To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string.

This is in the documentation for the writer class, suggesting it is true for all dialects and is an intrinsic limitation of the csv module.

I for one would support changing this (along with various other limitations of the csv module), but it may be that people would want to offload this sort of work into a different library, and keep the CSV module simple (or at least as simple as it is).

If you need more powerful file-reading capabilities, you might want to look at the CSV reading functions in numpy, scipy, and pandas, which as I recall have more options.

answered Jul 7, 2012 at 23:41

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1509316 Over a year ago

Yep confirmed : looking at csv_writerow in Modules/_csv.c ( if (field == Py_None) ... ). There is no way to distinguish between a '' and a None. Really a shame, given the Dialect abstraction you would have hoped for a bit more flexibility. You mention other limitations of the csv module, would you mind elaborating (If there are other issues I really should start looking at other csv-reading writing) ?

BrenBarn Over a year ago

One limitation I've found annoying sometimes is that delimiters must be a single character. So you can't parse a file where columns are separated by, say, two tabs. Like the None thing you came across, this is fairly easy to work around, but still annoying.

Spencer Rathbun Over a year ago

Another is the hard coded ascii limitation within the module.

martineau · Accepted Answer · 2021-03-25 14:06:36Z

You could at least partially side-step what the csv module does by creating your own version of a singleton None-like class/value:

from __future__ import print_function
import csv


class NONE(object):
    ''' None-like class. '''
    def __repr__(self): # Method csv.writer class uses to write values.
        return 'NONE'   # Unique string value to represent None.
    def __len__(self):  # Method called to determine length and truthiness.
        return 0

NONE = NONE()  # Singleton instance of the class.


if __name__ == '__main__':

    try:
        from cStringIO import StringIO  # Python 2.
    except ModuleNotFoundError:
        from io import StringIO  # Python 3.

    data = [['None value', None], ['NONE value', NONE], ['empty string', '']]
    f = StringIO()
    csv.writer(f).writerows(data)

    f = StringIO(f.getvalue())
    print(" input:", data)
    print("output:", [e for e in csv.reader(f)])

Results:

 input: [['None value', None], ['NONE value', NONE],   ['empty string', '']]
output: [['None value', ''],   ['NONE value', 'NONE'], ['empty string', '']]

Using NONE instead of None would preserve enough information for you to be able to differentiate between it and any actual empty-string data values.

Even better alternative…

You could use the same approach to implement a pair of relatively lightweight csv.reader and csv.writer “proxy” classes — necessary since you can't actually subclass the built-in csv classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies.

from __future__ import print_function
import csv


class csvProxyBase(object): _NONE = '<None>'  # Unique value representing None.


class csvWriter(csvProxyBase):
    def __init__(self, csvfile, *args, **kwrags):
        self.writer = csv.writer(csvfile, *args, **kwrags)
    def writerow(self, row):
        self.writer.writerow([self._NONE if val is None else val for val in row])
    def writerows(self, rows):
        list(map(self.writerow, rows))


class csvReader(csvProxyBase):
    def __init__(self, csvfile, *args, **kwrags):
        self.reader = csv.reader(csvfile, *args, **kwrags)
    def __iter__(self):
        return self
    def __next__(self):
        return [None if val == self._NONE else val for val in next(self.reader)]
    next = __next__  # Python2.x compatibility.


if __name__ == '__main__':

    try:
        from cStringIO import StringIO  # Python 2.
    except ModuleNotFoundError:
        from io import StringIO  # Python 3.

    data = [['None value', None], ['empty string', '']]
    f = StringIO()
    csvWriter(f).writerows(data)

    f = StringIO(f.getvalue())
    print("input : ", data)
    print("ouput : ", [e for e in csvReader(f)])

Results:

 input: [['None value', None], ['empty string', '']]
output: [['None value', None], ['empty string', '']]

A variation of the first solution solved the write problem for me. Created a class NONE(int) with a repr which returns an empty string. Replaced all None values with NONE (I had to format my data anyway so was no extra work). Then create the csv writer with QUOTE_NONNUMERIC. This is a bit hacky, but it means that in the output file you know that a quoted field is always a string, and an unquoted empty field is always a None.
@trelltron: Clever, but a potential downside is that it requires the use of QUOTE_NONNUMERIC which you may not want otherwise and has the potential the make the files much larger. My solution(s) don't require using any particular csv options in order to work. Also note that my second alternative doesn't require replacing all None values with anything.

jryan14ify · Accepted Answer · 2024-05-09 22:27:56Z

5

This has actually been fixed in Python 3.12 using csv.QUOTE_STRINGS which you pass in to your csv reader.

l = [
    ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 
    ['Example', '', None, 42, 3.5, r'\n', ' , ']]

for quoting in [csv.QUOTE_STRINGS, csv.QUOTE_MINIMAL, csv.QUOTE_ALL, csv.QUOTE_NONNUMERIC, csv.QUOTE_NONE]:
    with open(filename, 'r') as f: 
        reader = csv.reader(f, quoting=quoting)
        print(f'Reading {filename}')
        for row in reader: 
            print(row)

answered May 9, 2024 at 22:27

jryan14ify

1772 silver badges8 bronze badges

1 Comment

Bas Jan 10 at 15:44

Python docs says: Note Due to a bug, constants QUOTE_NOTNULL and QUOTE_STRINGS do not affect behaviour of reader objects. This bug is fixed in Python 3.13.

John Machin · Accepted Answer · 2012-07-07 23:29:11Z

2

As you have control over both the consumer and the creator of the serialised data, consider using a format that does support that distinction.

Example:

>>> import json
>>> json.dumps(['foo', '', None, 666])
'["foo", "", null, 666]'
>>>

answered Jul 7, 2012 at 23:29

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

1 Comment

jryan14ify Over a year ago

or python's pickle module would work too

kojiro · Accepted Answer · 2012-07-07 23:19:45Z

1

I don't think it would be possible to do what you want with a mere dialect, but you could write your own csv.reader/write subclass. On the other hand, I still think that is overkill for this use case. Even if you want to catch more than just None, you probably just want str():

>>> data = [['NULL/None value',None],['empty string','']]
>>> i = cStringIO.StringIO()
>>> csv.writer(i).writerows(map(str,row) for row in data)
>>> print i.getvalue()
NULL/None value,None
empty string,

answered Jul 7, 2012 at 23:19

kojiro

77.8k20 gold badges151 silver badges217 bronze badges

Comments

jrc · Accepted Answer · 2019-11-14 14:27:01Z

1

As mentioned above, this is a limitation of the csv module. A solution is just to rewrite the rows inside a loop with a simple dictionary comprehension, as follows:

reader = csv.DictReader(csvfile)
for row in reader:
    # Interpret empty values as None (instead of '')
    row = {k: v if v else None for k, v in row.items()}
    :

answered Nov 14, 2019 at 14:27

jrc

22.3k10 gold badges76 silver badges67 bronze badges

Comments

martineau · Accepted Answer · 2012-07-09 10:29:36Z

As others have pointed out you can't really do this via csv.Dialect or parameters to csv.writer and/or csv.reader. However as I said in one comment, you implement it by effectively subclassing the latter two (you apparently can't really do because they're built-in). What the "subclasses" do on writing is simply intercept None values and change them into a unique string and reverse the process when reading them back in. Here's a fully worked-out example:

import csv, cStringIO
NULL = '<NULL>'  # something unlikely to ever appear as a regular value in your csv files

class MyCsvWriter(object):
    def __init__(self, *args, **kwrds):
        self.csv_writer = csv.writer(*args, **kwrds)

    def __getattr__(self, name):
        return getattr(self.csv_writer, name)

    def writerow(self, row):
        self.csv_writer.writerow([item if item is not None else NULL
                                      for item in row])
    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

class MyCsvReader(object):
    def __init__(self, *args, **kwrds):
        self.csv_reader = csv.reader(*args, **kwrds)

    def __getattr__(self, name):
        return getattr(self.csv_reader, name)

    def __iter__(self):
        rows = iter(self.csv_reader)
        for row in rows:
            yield [item if item != NULL else None for item in row]

data = [['NULL/None value', None],
        ['empty string', '']]

f = cStringIO.StringIO()
MyCsvWriter(f).writerows(data)  # instead of csv.writer(f).writerows(data)

f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in MyCsvReader(f)]  # instead of [e for e in csv.reader(f)]

print "input : ", data
print "ouput : ", data2

Output:

input :  [['NULL/None value', None], ['empty string', '']]
ouput :  [['NULL/None value', None], ['empty string', '']]

It's a tad verbose and probably slows the reading & writing of csv file a bit (since they're written in C/C++) but that may make little difference since the process is likely low-level I/O bound anyway.

XiYu Wang · Accepted Answer · 2019-04-25 08:20:30Z

0

I meet this problem too and find this https://bugs.python.org/issue23041.

Solutions from the issue:

subclass csv.DictWriter, use dictionaries as your element type, and have its writerow method do the application-specific work.

define a writerow() function which does something similar (essentially wrapping csv.writerow()).

answered Apr 25, 2019 at 8:20

XiYu Wang

11 silver badge2 bronze badges

Collectives™ on Stack Overflow

CSV reader behavior with None and empty string

8 Answers 8

3 Comments

Even better alternative…

2 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

3 Comments

Even better alternative…

2 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related