1

I have a very large csv file with roughly 30,000 lines and 25 columns that gets produced daily. I need to filter this file to contain only rows that are of interest for me. It is of the form:

date, time, user, entity, party1, party2
20131001, 00:01, user1, ABC, XXX, XXX
20131002, 00:01, user2, XYZ/ABC, XXX, ABC
20131003, 00:01, user1, DEF, ABC, XXX

For example I need to delete all rows that have entity=ABC. I was thinking of either

  1. read the file in and delete each line that contains ABC, but that would get rid of lines that I actually need. I only want to delete lines that contain ABC in the entity column to be removed.

  2. use the csv module in python and try to achieve the same. I've read the functions available in csv, but it doesn't seem to provide anything that let's me delimit by field per column.

I am not necessarily looking for an answer in code, but any general advice on how to solve this problem would be welcome.

Thanks a lot.

2
  • Dont you think this will fall under primarily opinion-based category? Commented Oct 30, 2013 at 6:11
  • 1
    Sorry, found the answer here: stackoverflow.com/questions/10530301/… Commented Oct 30, 2013 at 6:19

2 Answers 2

1

You can certainly do what you want with Python's csv module, as you suggest and as e.g. @DhruvPathak outlines in his answer (better still here), but I think it's much simpler to do it with a one-line awk script:

$ awk -F ', ' '{ if ($4 != "ABC") print; }' < file.txt
date, time, user, entity, party1, party2
20131002, 00:01, user2, XYZ/ABC, XXX, ABC
20131003, 00:01, user1, DEF, ABC, XXX

where file.txt contains your data.

Sign up to request clarification or add additional context in comments.

Comments

0
for mycsv_line in csv_reader:
    if mycsv_line[4] != "ABC" :
    #append to result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.