1

So I'm dealing with a csv file that has missing values. What I want my script to is:

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x
print row

Here is an example of data, I trying it on, ideally it should work on any column lenghth

Before:
actnum,col2,col4
xxxxx ,    ,
xxxxx , 845   ,
xxxxx ,    ,545

After
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 0  ,545

Any guidance would be appreciated

Update Here is what I have now (thanks):

reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
print row

However it only seems to out put one record, I will be piping the output to a new file on the command line.

Update 3: Ok now I have the opposite problem, I'm outputting duplicates of each records. Why is that happening?

After
actnum,col2,col4
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 845, 0
xxxxx , 0  ,545
xxxxx , 0  ,545

Ok I fixed it (below) thanks you guys for your help.

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
    print ','.join(str(x) for x in row)
3
  • The reason you only print one line is that your print statement is outside of the for loop - indent it once, and you should be fine. Commented May 19, 2010 at 4:25
  • You can replace "if len(x) <1:" with " if x.strip():". "" will evaluate to False, and any string value will evaulate to True (including spaces). Commented May 19, 2010 at 4:52
  • In your update you removed the print statement within the loop where you really wanted to remove the one outside the loop. You may also wish to strip your strings, as per my answer below. Commented May 19, 2010 at 12:28

2 Answers 2

4

Change your code:

for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x

into:

for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
                print x

Not sure what you think you're accomplishing by the print, but the key issue is that you need to modify row, and for that purpose you need an index into it, which enumerate gives you.

Note also that all other values, except the empty ones which you're changing into the number 0, will remain strings. If you want to turn them into ints you have to do that explicitly.

Sign up to request clarification or add additional context in comments.

Comments

1

You are very nearly there!

There are just a couple of small bugs.

  • len(x)< 1 will not work for the second column in the second row of your data because x will contain ' ' (and have a length > 1). You'll need to strip your strings.

  • print row will probably print an empty list because you've finished iterating. You can probably just remove this line.

Also: Are you trying to modify the file or just output the corrections to pipe to some other file or process?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.