1

Starting Question

I have a CSV file which is formed this way (variable.csv)

E,F,G,H,I,J
a1,
,b2,b3
c1,,,c4,c5,c6

As you can see, the first and second columns do not have all the commas needed. Here's what I want:

E,F,G,H,I,J
a1,,,,,
,b2,b3,,,
c1,,,c4,c5,c6

With this, now every row has the right number of columns. In other words, I'm looking for a unix command which smartly appends the correct number of commas to the end of each row to make the row have the number of columns that we expect, based off the header.

Here's what I tried, based off of some searching: awk -F, -v OFS=, 'NF=6' variable.csv. This works in the above case, BUT...

Final Question

...Can we have this command work if the column data contains commas itself, or even new line characters? e.g.

E,F,G,H,I,J
"a1\n",
,b2,"b3,3"
c1,,,c4,c5,c6

to

E,F,G,H,I,J
"a1\n",,,,,
,b2,"b3,3",,,
c1,,,c4,c5,c6

(Apologies if this example's formatting is malformed due to the way the newline is represented.

3
  • 1
    Your command seems to work for your example. NF=6 is an assignment, returning true, and then prints with additional commas. Are there any rows it doesn't work? Commented Jul 29, 2020 at 1:01
  • @thanasisp yes, it seems to have been a machine-specific issue. In addition I realize the core of my question is missing a critical component, so I've updated the main question to add that. Commented Jul 29, 2020 at 3:31
  • 1
    awk -v FPAT='([^,]*)|("[^"]+")' -v OFS=, 'NF=6' variable.csv will work, ignoring commas inside quoted fileds. See: stackoverflow.com/questions/29642102/… Commented Jul 29, 2020 at 3:54

1 Answer 1

2

Short answer:

python3 -c 'import fileinput,sys,csv;b=list(csv.reader(fileinput.input()));w=max(len(i)for i in b);print("\n".join([",".join(i+[""]*(w-len(i)))for i in b]))' variable.csv

The python script may be long, but this is to ensure that all cases are handled. To break it down:

import fileinput,csv
b=list(csv.reader(fileinput.input()))                       # create a reader obj
w=max(len(i)for i in b)                                     # how many fields?
print("\n".join([",".join(i+[""]*(w-len(i)))for i in b]))   # output

BTW, in your starting problem

awk -F, -v OFS=, 'NF<6{$6=""}1' variable.csv

should work. (I think it's implementation or version related. Your code works on GNU awk but not on Mac version.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.