awk | split the column and map with substring

Question

File1 (Big file with millions of records)

1,5,404920012,type1,info1
4,2,10000234,type2,info2
2,3,40492001223,type3,info3
3,1,50492001223,type4,info4

File2 (small file with only 10 lines)

40492=product1
10000=product2

Output

1,5,404920012,type1,product1,info1
4,2,10000234,type2,product2,info2
2,3,40492001223,type3,product1,info3
3,1,50492001223,type4,NOMATCH,info4

Notes:
I want to compare first 5 character of 3rd column from file1 to file 2
after matching fields, product1 should be in 5th column of output

What I have Tried:
Though I am new in awk advance programming, but what I have a learnt so far.. I can merge 2 files as follows:

BEGIN { FS = OFS = "," } FNR == NR { x=$1; $1 = ""; a[x] = $0; next }
{ if($3 in a)print $0","a[$3]; else print $0",,"; }

But issue with this is.. I dont know how to pass different FS (= in case of 2nd file)for 2nd file and how to combine substr function echo 404920012 | awk '{print substr($0,1,5)}' in if condition..

I have updated what I have tried and where I am stuck below my query.. — Vipin Choudhary
– Vipin Choudhary, Commented Nov 29, 2013 at 9:28

Ed Morton · Accepted Answer · 2013-11-29 15:20:03Z

3

This is a case when, since it varies between input files, you want to set FS to the appropriate value between files rather than up front in the awk command arguments using -F or -v FS=:

awk '
NR==FNR { map[$1] = $2; next }
{
    key = substr($3,1,5)
    $5 = (key in map ? map[key] : "NOMATCH") OFS $5
    print
}
' FS='=' File2 FS=',' OFS=',' File1

Just in case: This is just briefer than the answer you selected, functionally it's equivalent so please don't change to select this one.

edited Nov 29, 2013 at 15:20

answered Nov 29, 2013 at 15:13

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Håkon Hægland · Accepted Answer · 2013-11-29 09:55:27Z

1

Try

awk -f p.awk file2 file1

where p.awk is

BEGIN{
    FS=","
    OFS=","
}

NR==FNR {
    split($1,a,"=")
    keys[a[1]]=a[2]
    next
}
{
    k=substr($3,1,5)
    if (k in keys) 
        p=keys[k]
    else
        p="NOMATCH"
    $4=$4 OFS p
    print
}

with output:

1,5,404920012,type1,product1,info1
4,2,10000234,type2,product2,info2
2,3,40492001223,type3,product1,info3
3,1,50492001223,type4,NOMATCH,info4

edited Nov 29, 2013 at 9:55

answered Nov 29, 2013 at 9:26

Håkon Hægland

40.9k22 gold badges96 silver badges209 bronze badges

5 Comments

Vipin Choudhary Over a year ago

Thanks Håkon .. that really helped.. but is it possible that while specifying print $1,$2,$3,$4,p,$5.. I can specify range as well? for example if my file has 100 fields.. so instead of writing individual column name, we write $1-$4,p,$5-$100..?

Vipin Choudhary Over a year ago

also could you explain this - keys[a[1]]=a[2]

Vipin Choudhary Over a year ago

Thanks.. not sure how expensive loop it would be in if file is really big.. in this case it will loop every line for every column.. But anyways.. I got the really meaningful answer.. thanks

Håkon Hægland Over a year ago

@VipinChoudhary The keys[a[1]]=a[2] builds an associative array keys from the result of the split function..

Håkon Hægland Over a year ago

@VipinChoudhary I found a better solution, see update.. Now you do not have to use a for loop at all :)

Collectives™ on Stack Overflow

awk | split the column and map with substring

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related