0

I have a csv file contaning 30000000 entries. like this

കൃഷി 3
വ്യാപകമാകുന്നു 2
നെല്‍കൃഷി 2
വെള്ളം 2
നെല്ല് 2
മാത്രമേ 2
ജല 2

When I try to reverse the word order I am getting the following error

Traceback (most recent call last):
  File "/home//grpus/dg.py", line 8, in <module>
    writer.writerow((row[1], row[0]))
IndexError: list index out of range

This is the code:

import csv

with open('s.csv', 'rb') as f:
    reader = csv.reader(f, delimiter='\t')
    with open("revmal.txt", "w") as o:
        writer = csv.writer(o, delimiter='\t')
        for row in reader:
            writer.writerow((row[1], row[0]))

Edit

 writer.writerow(row[::-1])

When I try to fix it

How to fix this error?

 Traceback (most recent call last):
      File "/home/grpus/dg.py", line 7, in <module>
        for row in reader:
    Error: field larger than field limit (131072)

The file is 1.4 Gb in size

wc -L s.csv

936

{if(length($0)>max){max=length($0);maxline=$0}}END{print maxline} This produced 

����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������!� 1, 186 characters

6
  • 1
    This can happen when you run into a blank line, or when one of your lines only has one column; also I realized you are simply writing the row in reverse, why not try writer.writerow(row[::-1])? Commented Mar 5, 2014 at 14:00
  • What is the size of this file? (The file system size). Commented Mar 5, 2014 at 15:08
  • what is the output of wc -L s.csv? Commented Mar 5, 2014 at 15:19
  • Hrmm, what is the output of this: awk '{if(length($0)>max){max=length($0);maxline=$0}}END{print maxline}' s.csv? Is it a very long line? Commented Mar 5, 2014 at 15:31
  • In the same format as your lines you pasted here? Commented Mar 5, 2014 at 16:41

2 Answers 2

2

You have at least one row that doesn't have 2 columns separated by a tab. An empty line, for example, or if your format doesn't actually use tabs.

You have two options:

  1. skip rows with fewer columns than you need:

    for row in reader:
        if len(row) < 2:
            continue
        writer.writerow((row[1], row[0]))
    
  2. fix your delimiter to match the actual file content:

    reader = csv.reader(f, delimiter=' ')
    

    you could use the csv.Sniffer() class to try and automate delimiter selection, if you have more than one file to process, and these files are not all following the same CSV dialect.

Sign up to request clarification or add additional context in comments.

7 Comments

It might be simpler to simply .split().
@BurhanKhalid: perhaps, but that won't solve the problem of possibly empty lines or lines with 1 column only.
I just had a thought - why not just writer.writerow(row[::-1]) since the idea is to reverse the order of the columns.
@MartijnPieters I think the problem is due to large number of entries in the csv file. In small files it works perfectly.
@karu: That's not a problem at all for the csv module, which processes your file line by line.
|
1

Since all you want to do is write the file in reverse order, just write the same row back, but in reverse; like this:

 writer.writerow(row[::-1])

A negative index starts from the right, and a negative step value (the third argument in the slice syntax) will simply reverse the object.

This will stop the error you are seeing now, and in case you have rows columns that are not 2, they will also be written in reverse.

5 Comments

Traceback (most recent call last): File "/home/akallararajappan/corpus/dg.py", line 7, in <module> for row in reader: Error: field larger than field limit (131072)
Do you have any quote characters in your file? Try reading it with quoting=csv.QUOTE_NONE, if this doesn't fix it - add csv.field_size_limit(sys.maxsize) before your with statement (don't forget to import sys first).
@BurhanKhalid: I suspect something else is wrong with the actual file when the OP has to handle 128kb lines..
Martijn could this be some encoding issue?
@BurhanKhalid: if an UTF encoding is used, I don't think so.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.