0

The following script is erroring out:

import csv,time,string,os,requests, datetime

test = "\\\\network\\Shared\\test.csv"

fields = ["id", "Expiration Date", "Cost", "Resale" ]

with open(test) as infile, open("c:\\upload\\tested.csv", "wb") as outfile:
    r = csv.DictReader(infile)
    w = csv.DictWriter(outfile, fields, extrasaction="ignore")
    r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r)

    wtr = csv.writer( outfile )                                                                                                                                                                                                                                                                                                                                         
    wtr.writerow(["id", "upload_date", "cost", "resale"])
    for i, row in enumerate(r, start=1):
        row['id'] = i
        print(row['Expiration Date']
        row['Expiration Date'] = datetime.datetime.strptime(row['Expiration Date'][:10], "%m/%d/%Y").strftime("%Y-%m-%d")
        w.writerow(row)


D:\Python\Scripts>python test.py
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    print(row['Expiration Date'])
KeyError: 'Expiration Date'

So I think I understand what's going on - something like this from the original file:

Expiration Date     Cost     Resale
2016-01-01          1.00     2.00
                    1.42     2.42
2016-05-02          1.45     9.00

From what I can gather, there is a row where the expiration date column is NOT populated. How do I force DictWriter to skip over blanks - assuming that is the cause of my error?

2 Answers 2

1

Actually, the dict produced by the csv.DictReader just puts None into a field it does not find and thus you should not get that error. You are not using the functionality of the DictReader to produce a proper dict! As far as I can tell, you try to do the parsing yourself by use of the line r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r). That does not actually work, though. If you print the rows afterwards you get:

{'Expiration Date     Cost     Resale': '2016-01-01          1.00     2.00'}
{'Expiration Date     Cost     Resale': '1.42     2.42'}
{'Expiration Date     Cost     Resale': '2016-05-02          1.45     9.00'}

So every dict contains only one key. A problem with your file is, that you don't have a valid delimiter between lines. It looks like you mean to use a whitespace, but you have a whitespace in Expiration Date, as well. You will have to get rid of that. If you do that, then you can use the DictReader like this:

import csv,time,string,os,requests, datetime

test = "test.csv"

with open(test) as infile:
    r = csv.DictReader(infile, delimiter=" ", skipinitialspace=True)
for row in r:
    print(row)

will now give you:

{'Resale': '2.00', 'Cost': '1.00', 'ExpirationDate': '2016-01-01'}
{'Resale': None, 'Cost': '2.42', 'ExpirationDate': '1.42'}
{'Resale': '9.00', 'Cost': '1.45', 'ExpirationDate': '2016-05-02'}

which is a proper dict (Notice that the reader has no way of telling, that the first element is the one missing, though). Now you only have to exclude lines that are not complete from writing. A nice way to do that is described here:

import csv,time,string,os,requests, datetime

test = "test.csv"

with open(test) as infile:
    r = csv.DictReader(infile, delimiter=" ", skipinitialspace=True)

    for row in r:
        if not any(val in (None, "") for val in row.itervalues()):
            print(row)

Finally, this will give you all valid lines as dicts:

{'Resale': '2.00', 'Cost': '1.00', 'ExpirationDate': '2016-01-01'}
{'Resale': '9.00', 'Cost': '1.45', 'ExpirationDate': '2016-05-02'}
Sign up to request clarification or add additional context in comments.

7 Comments

I didn't really understand the r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r) line - the code itself came from an SO post I made about trimming white space on each value before it was written, but you're right, when I remove that line, it solves the problem of this particular thing happening - but the code WAS stripping white space, so now I need to find a new way to trim white space from my output file...
@BrianPowell The skipinitialspace=Trueparameter that I used in my sample strips all leading whitespaces. Isn't that what you need?
I have several values in the sheet (which is like 50 columns, not the 3 I showed here) which have white space at the beginning, or at the end: ` hello ` - with the code from my comments above, it did trim that white space out, but I guess it also screwed up my dictionary. My understanding is that skipinitialspace only trims white space from the beginning of a value.
The problem is not that line but that you don't have commas in you comma-separated-values file and you didn't specify any other delimiter. If you specify delimiter=" " then he is able to correctly produce your dictionary (as long as none of the keys/values conatain whitespaces). And then skipinitialspace=True skips all whitespaces, since there will be none after an entry (as the first whitespace after an entry is treated as delimiter).
Ok then you can actually use something like the line you had. First you will have to change delimiter="," and afterwards you can insert r = [{k.strip(): v.strip() for k, v in row.items()} for row in r]. That will work for python 2.7.2 and newer. Then you can use the if clause later to excempt all rows with a missing entry.
|
0

You got a KeyError accessing something not in the dict at x['Expiration Date'] so you could say x.get('Expiration Date') or possibly 'Expiration Date' in x instead to discover if it exists and conditionally discard that row.

1 Comment

Actually, his read in is flawed and he never has anything else but "Expiration Date Cost Resale" as a key in his dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.