join multiple lines based on column1

Question

I have a file like below..

abc, 12345
def, text and nos    
ghi, something else   
jkl, words and numbers

abc, 56345   
def, text and nos   
ghi, something else 
jkl, words and numbers

abc, 15475  
def, text and nos 
ghi, something else
jkl, words and numbers

abc, 123345
def, text and nos
ghi, something else  
jkl, words and numbers

I want to convert (join) it as:

abc, 12345, 56345, 15475, 123345
def, text and nos, text and nos,text and nos,text and nos
ghi, something else, something else, something else, something else   
jkl, words and numbers, words and numbers, words and numbers, words and numbers

Do you actually have the extra blank lines in your input file? If not, please edit and remove them, you should show the file exactly as it is. — terdon
– terdon ♦, Commented Apr 11, 2014 at 14:23

αғsнιη · Accepted Answer · 2017-06-20 08:37:22Z

11

If you don't mind the order of output:

$ awk -F',' 'NF>1{a[$1] = a[$1]","$2};END{for(i in a)print i""a[i]}' file 
jkl, words and numbers, words and numbers, words and numbers, words and numbers
abc, 12345, 56345, 15475, 123345
ghi, something else, something else, something else, something else
def, text and nos, text and nos, text and nos, text and nos

Explanation

NF>1 meaning we only need to process for line which is not blank.
We save all first field in the associative array a, with the key is the first field, the value is second field (or the rest of the line). If the key has already haved value, we concat two values.
In END block, we loop through the associative array a, print all its keys with corresponding value.

Or using perl will keep the order:

$perl -F',' -anle 'next if /^$/;$h{$F[0]} = $h{$F[0]}.", ".$F[1];
    END{print $_,$h{$_},"\n" for sort keys %h}' file
abc, 12345, 56345, 15475, 123345

def, text and nos, text and nos, text and nos, text and nos

ghi, something else, something else, something else, something else

jkl, words and numbers, words and numbers, words and numbers, words and numbers

edited Jun 20, 2017 at 8:37

αғsнιη

41.9k17 gold badges75 silver badges118 bronze badges

answered Apr 11, 2014 at 4:01

cuonglm

158k41 gold badges342 silver badges420 bronze badges

your perl solution from my question unix.stackexchange.com/questions/124181/… should also work right?

Ramesh
– Ramesh

2014-04-11 04:08:46 +00:00
Commented Apr 11, 2014 at 4:08
No. The OP want to concat string based on column 1, regardless of duplicated or not. Your question doesn't want duplicated.

cuonglm
– cuonglm

2014-04-11 04:16:07 +00:00
Commented Apr 11, 2014 at 4:16
oh ok. At the first glance, it seemed like almost similar to my question. :)

Ramesh
– Ramesh

2014-04-11 04:19:12 +00:00
Commented Apr 11, 2014 at 4:19
1

Neat, +1! That doesn't keep the order though, it only recreates it in this particular example where the fields are in alphabetical order.

terdon
– terdon ♦

2014-04-11 14:29:53 +00:00
Commented Apr 11, 2014 at 14:29
Just for laughs, I'd written almost exactly the same approach before reading your answer: perl -F, -lane 'next unless /./;push @{$k{$F[0]}}, ",@F[1..$#F]"; END{print "$_@{$k{$_}}" foreach keys(%k)}' file :) Great minds think alike!

terdon
– terdon ♦

2014-04-11 14:43:17 +00:00
Commented Apr 11, 2014 at 14:43

| Show 7 more comments

score 2 · Accepted Answer · 2014-04-12 08:25:15Z

Oh, that's an easy one. Here's a simple version that keeps the order of the keys as they appear in the file:

$ awk -F, '
    /.+/{
        if (!($1 in Val)) { Key[++i] = $1; }
        Val[$1] = Val[$1] "," $2; 
    }
    END{
        for (j = 1; j <= i; j++) {
            printf("%s %s\n%s", Key[j], Val[Key[j]], (j == i) ? "" : "\n");       
        }                                    
    }' file.txt

Output should look like this:

abc, 12345, 56345, 15475, 123345

def, text and nos, text and nos, text and nos, text and nos

ghi, something else, something else, something else, something else

jkl, words and numbers, words and numbers, words and numbers, words and numbers

If you don't mind having an extra blank line at the end, just replace the printf line with printf("%s %s\n\n", Key[j], Val[Key[j]]);

Stack Exchange Network

join multiple lines based on column1

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

join multiple lines based on column1

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions