Using Awk to create a csv line

Question

I am new to awk and can't quite figure out the best way to do this. I have thousands of xml files which I have already removed duplicates and divided fields into a single column in a single file using sed and awk.

Now I want to assemble the list into a csv file containing multiple fields on one line. After a fixed number of fields I want to start a new line.

Example

Output

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Thanks

cha0site · Accepted Answer · 2012-07-05 17:57:17Z

2

Is using xargs allowed?

cat input | xargs -L13 -d'\n' | sed -e 's/ /, /g'

I get this output here:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

It's sort of hackey, though, if you started out with XML you should consider using XSLT.

answered Jul 5, 2012 at 17:57

cha0site

10.8k3 gold badges36 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dennis Williamson Over a year ago

Use xargs -a input instead of cat.

cha0site Over a year ago

@Dennis: Or xargs ... < input. But the reason I used cat there is because he made it sound as if he wants something to put at the end of an existing pipeline, and I wanted that usage to be clearer.

Elton Carvalho · Accepted Answer · 2012-07-05 18:06:09Z

2

If every line would have the same number of fields, say, 5, I would do something like

awk ' { printf("%s",$1); if (NR % 5 == 0) {printf("\n")} else {printf(",")}}' youtfile.txt

NR is the number of lines read by awk and % is the remainder operator. So if the number of lines read is a multiple of 5 (in this case) it will print a line break, otherwise it will print a comma.

This assumes one field per line as in your example and that blank lines in the input will correspond to blank fields in the CSV.

edited Jul 5, 2012 at 18:06

answered Jul 5, 2012 at 18:00

Elton Carvalho

5146 silver badges11 bronze badges

Comments

Birei · Accepted Answer · 2012-07-05 18:19:33Z

One way using sed:

Content of script.sed:

## Label 'a'
:a

## If last line, print what is left in the buffer, substituting
## newlines with commas.
$ {
    s/^\n//
    s/\n/, /g
    p   
    q   
}

## If content of buffer has 12 newlines, we have reached to the limit
## of the line, so remove newlines with commas, print and delete buffer
## overwritting it with 'b'
/\([^\n]*\n\)\{12\}/ {
    s/^\n//
    s/\n/, /g
    p   
    b   
}

## Here buffer has not reached to the limit of fields for each line, so
## append one more (N) and continue loop in label 'a'
N
ba

Run it like:

sed -nf script.sed infile

With following output:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

potong · Accepted Answer · 2012-07-05 19:02:50Z

0

This might work for you:

paste -sd',,,,,,,,,,,,\n' file | sed 's/,/, /g'
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

or this (GNU sed):

sed ':a;$bb;N;s/\n/&/12;Ta;:b;s/\n/, /g' file
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

answered Jul 5, 2012 at 19:02

potong

59.3k6 gold badges55 silver badges92 bronze badges

Collectives™ on Stack Overflow

Using Awk to create a csv line

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related