1

I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python.

I'm having trouble with the "replace" part of the solution. I've read the official csv module documentation about how to use the writer, but there isn't really a clear enough example for me (yes, I'm slow). So, now for the question: how does one iterate through the rows of a csv file with a writer object?

p.s. apologies in advance for the clumsy code, I'm still learning :)

import csv

csvfile = open("PALTemplateData.csv")
csvout = open("PALTemplateDataOUT.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
writer = csv.writer(csvout, dialect)

total=0;
needchange=0;
changed = 0;
temp = ''
changeList = []

for row in reader:
    total=total+1
    temp = row[len(row)-1]
    if '/?' in temp:
        needchange=needchange+1;
        changeList.append(row.index)

for row in writer:           #this doesn't compile, hence the question
    if row.index in changeList:
        changed=changed+1
        temp = row[len(row)-1]
        temp.replace('/?', '?')
        row[len(row)-1] = temp
        writer.writerow(row)

print('Total URLs:', total)
print('Total URLs to change:', needchange)
print('Total URLs changed:', changed)
4
  • Is PALTemplateDataOUT.csv empty when you start? Commented Jun 19, 2009 at 17:52
  • no, it's the exact same file as the input file (has all of the same data) I just didn't want to accidentally overwrite anything I needed Commented Jun 19, 2009 at 17:54
  • What does "accidentally overwrite" mean? Usually we read one file and write a different file. Commented Jun 19, 2009 at 18:19
  • I guess you're right - it is pretty silly (not to mention bad "practice") to have that many copies. Commented Jun 19, 2009 at 18:23

3 Answers 3

6

The reason you're getting an error is that the writer doesn't have data to iterate over. You're supposed to give it the data - presumably, you'd have some sort of list or generator that produces the rows to write out.

I'd suggest just combining the two loops, like so:

for row in reader:
    row[-1] = row[-1].replace('/?', '?')
    writer.writerow(row)

And with that, you don't even need total, needchange, and changeList. (There are a bunch of optimizations in there that I unfortunately don't have time to explain, but I'll see if I can edit that info in later)

Sign up to request clarification or add additional context in comments.

6 Comments

Keep in mind that doing this will overwrite your output file as you go. This is the typical way of doing this kind of thing though. Starting with two copies of the file, like you have above, isn't the best practice.
+1: For that matter, you generally don't need the sniffer or the seek(0), either.
That certainly is an elegant solution, however, I don't think you can use the same 'row' reference in both the for loop iteration and as the parameter of writerow(). The interpreter says: "io.UnsupportedOperation: BufferedReader.write() not supported"
@ignorantslut: write() certainly is not supported. writerow(), however, should be supported. Check your code to be sure you spelled the method name correctly.
@S. Lott: I copied and pasted directly from David's example. Full traceback: File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 21, in <module> writer.writerow(row) File "C:\Program Files\Python\lib\io.py", line 1495, in write self.buffer.write(b) File "C:\Program Files\Python\lib\io.py", line 701, in write self._unsupported("write") File "C:\Program Files\Python\lib\io.py", line 322, in unsupported (self.__class_.__name__, name)) io.UnsupportedOperation: BufferedReader.write() not supported
|
1

You should only have one loop and read and write at the same time - if your replacements only affect one line at a time, you don't need to loop over the data twice.

for row in reader:
  total=total+1
  temp = row[len(row)-1]
  if '/?' in temp:
    temp = row[len(row)-1]
    temp.replace('/?', '?')
    row[len(row)-1] = temp
  writer.writerow(row)

This is just to illustrate the loop, not sure if the replacement code will work like this.

Comments

0

Once you have your csv in a big list, one easy way to replace a column in a list would be to transpose your matrix, replace the row, and then transpose it back:

mydata = [[1, 'a', 10], [2, 'b', 20], [3, 'c', 30]]

def transpose(matrix):
    return [[matrix[x][y] for x in range(len(matrix))] for y in range(len(matrix[0]))]

transposedData = transpose(mydata)
print transposedData
>>> [[1, 2, 3], ['a', 'b', 'c'], [10, 20, 30]]

editedData = transposedData[:2] + [50,70,90]
print editedData
>>> [[1, 2, 3], ['a', 'b', 'c'], [50, 70, 90]]

mydata = transpose(editedData)
print mydata
>>> [[1, 'a', 50], [2, 'b', 70], [3, 'c', 90]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.