Deleting rows in a CSV via shell script based on a columns value

Question

I'm very inexperienced with shell scripts, and I need to write one that deletes an entire row when a column named Views contains the value 0. The column "Views" may not always be in the same location in the file, so I would need some way to find the location of the column before hand. Is this something that is feasible with sed or awk? Or is there something else that I can use?

Thanks!

Can you show example input and output? I'd like to see the way the headers are formatted, in particular. — Wintermute
– Wintermute, Commented Feb 16, 2015 at 18:54
@Wintermute hey yea, so its just a standard CSV. The headers are the first line of the file: Date,....,Views,...,URL. The sample output would be the exact same CSV file, just with rows with 0 views removed from it — Gus
– Gus, Commented Feb 16, 2015 at 18:58

Wintermute · Accepted Answer · 2015-02-16 19:04:23Z

4

With awk, this could be done like this:

awk -F, 'NR == 1 { for(i = 1; i <= NF; ++i) { col[$i] = i }; next } $col["Views"] != 0' filename.csv

-F, sets the field separator to a comma, since you mentioned a CSV file. The code is

NR == 1 {                    # in the first line
  for(i = 1; i <= NF; ++i) { # go through all fields
    col[$i] = i              # remember their index by name.
                             # ($i is the ith field)
  }
  next                       # and do nothing else
}

$col["Views"] != 0           # after that, select lines in which the field in
                             # the column that was titled "Views" is not zero,
                             # and do the default action on them (i.e., print)

Note that this will only filter out lines where the Views column is exactly 0. If you also want to filter out lines where the Views field is empty, use $col["Views"] instead of $col["Views"] != 0.

edited Feb 16, 2015 at 19:04

answered Feb 16, 2015 at 18:59

Wintermute

44.3k5 gold badges85 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Gus Over a year ago

This looks pretty good, but the only issue is its just being output on the console. I need those rows to be physically deleted from the file. Is this possible with awk?

Wintermute Over a year ago

With GNU awk 4.1.0 or later, use awk -i inplace same_as_before_here. Or, because it is nice to have a backup in case the power goes out at the wrong moment, cp foo.csv foo.csv~ && awk same_as_before foo.csv~ > foo.csv.

Gus Over a year ago

it doesn't look like those rows are being skipped. Are we sure that this part is right col[$i] = i i is the index of the fields isn't it? So $col["Views"] would be set to the index rather than the actual value contained in that column, which is what needs to be checked against 0 for every line in the file

Wintermute Over a year ago

$i is not the value of i, it's the value of the ith field. In the same vein, $col["Views"] is the value of the col["Views"]th field. Can you add some input data to the question so I can see what's different in my guessed test data? It works for me.

Gus Over a year ago

Looks like I copied it wrong. It's working now for me. Thanks!

|

repzero · Accepted Answer · 2015-02-17 10:39:39Z

0

awk -F ',' 'NR==1{print;for(i=1;i<=NF;++i){if($i=="Views"){x=$i;y=i}}};NR>1{if($y!=0){print}}'  file > new_file

breakdown of code

NR==1{                    #for the first line 
print                     #print it 
for(i=1;i<=NF;++i){       #make a loop to read all the column and find the 
    if($i=="Views"){      #name "Views" in the first row. 
        y=i               #Save the column number in a variable named y
    }
}
}

NR>1{                     # start from line 2 going downwards targeting
     if($y!=0){           # the Views Column
       print              #if it does not contain 0, print the line
     }
}

edited Feb 17, 2015 at 10:39

answered Feb 16, 2015 at 23:04

repzero

8,4203 gold badges21 silver badges42 bronze badges

Comments

RavinderSingh13 · Accepted Answer · 2020-10-31 17:04:44Z

0

awk '($1 == "badString") && !($1 ~ /[.]/) { next } 1' inputfile > outputfile

#if first column = badString or has . (dot) dont include it in outputfile

edited Oct 31, 2020 at 17:04

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

answered Oct 31, 2020 at 15:53

Khaled Kesmia

12 bronze badges

Collectives™ on Stack Overflow

Deleting rows in a CSV via shell script based on a columns value

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related