3

I am new to awk and can't quite figure out the best way to do this. I have thousands of xml files which I have already removed duplicates and divided fields into a single column in a single file using sed and awk.

Now I want to assemble the list into a csv file containing multiple fields on one line. After a fixed number of fields I want to start a new line.

Example

1234
2345

345678
4.23456E3
54321
654321
789

87654.100
9876

10.0
1234
2345

345678
4.23456E3
54321
654321
789

87654.100
9876

11.0

Output

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Thanks

4 Answers 4

2

Is using xargs allowed?

cat input | xargs -L13 -d'\n' | sed -e 's/ /, /g'

I get this output here:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

It's sort of hackey, though, if you started out with XML you should consider using XSLT.

Sign up to request clarification or add additional context in comments.

2 Comments

Use xargs -a input instead of cat.
@Dennis: Or xargs ... < input. But the reason I used cat there is because he made it sound as if he wants something to put at the end of an existing pipeline, and I wanted that usage to be clearer.
2

If every line would have the same number of fields, say, 5, I would do something like

awk ' { printf("%s",$1); if (NR % 5 == 0) {printf("\n")} else {printf(",")}}' youtfile.txt

NR is the number of lines read by awk and % is the remainder operator. So if the number of lines read is a multiple of 5 (in this case) it will print a line break, otherwise it will print a comma.

This assumes one field per line as in your example and that blank lines in the input will correspond to blank fields in the CSV.

Comments

2

One way using sed:

Content of script.sed:

## Label 'a'
:a

## If last line, print what is left in the buffer, substituting
## newlines with commas.
$ {
    s/^\n//
    s/\n/, /g
    p   
    q   
}

## If content of buffer has 12 newlines, we have reached to the limit
## of the line, so remove newlines with commas, print and delete buffer
## overwritting it with 'b'
/\([^\n]*\n\)\{12\}/ {
    s/^\n//
    s/\n/, /g
    p   
    b   
}

## Here buffer has not reached to the limit of fields for each line, so
## append one more (N) and continue loop in label 'a'
N
ba

Run it like:

sed -nf script.sed infile

With following output:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Comments

0

This might work for you:

paste -sd',,,,,,,,,,,,\n' file | sed 's/,/, /g'
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

or this (GNU sed):

sed ':a;$bb;N;s/\n/&/12;Ta;:b;s/\n/, /g' file
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.