3

File1 (Big file with millions of records)

1,5,404920012,type1,info1
4,2,10000234,type2,info2
2,3,40492001223,type3,info3
3,1,50492001223,type4,info4

File2 (small file with only 10 lines)

40492=product1
10000=product2

Output

1,5,404920012,type1,product1,info1
4,2,10000234,type2,product2,info2
2,3,40492001223,type3,product1,info3
3,1,50492001223,type4,NOMATCH,info4

Notes:
I want to compare first 5 character of 3rd column from file1 to file 2
after matching fields, product1 should be in 5th column of output

What I have Tried:
Though I am new in awk advance programming, but what I have a learnt so far.. I can merge 2 files as follows:

BEGIN { FS = OFS = "," } FNR == NR { x=$1; $1 = ""; a[x] = $0; next }
{ if($3 in a)print $0","a[$3]; else print $0",,"; }

But issue with this is.. I dont know how to pass different FS (= in case of 2nd file)for 2nd file and how to combine substr function echo 404920012 | awk '{print substr($0,1,5)}' in if condition..

1
  • I have updated what I have tried and where I am stuck below my query.. Commented Nov 29, 2013 at 9:28

2 Answers 2

3

This is a case when, since it varies between input files, you want to set FS to the appropriate value between files rather than up front in the awk command arguments using -F or -v FS=:

awk '
NR==FNR { map[$1] = $2; next }
{
    key = substr($3,1,5)
    $5 = (key in map ? map[key] : "NOMATCH") OFS $5
    print
}
' FS='=' File2 FS=',' OFS=',' File1

Just in case: This is just briefer than the answer you selected, functionally it's equivalent so please don't change to select this one.

Sign up to request clarification or add additional context in comments.

Comments

1

Try

awk -f p.awk file2 file1

where p.awk is

BEGIN{
    FS=","
    OFS=","
}

NR==FNR {
    split($1,a,"=")
    keys[a[1]]=a[2]
    next
}
{
    k=substr($3,1,5)
    if (k in keys) 
        p=keys[k]
    else
        p="NOMATCH"
    $4=$4 OFS p
    print
}

with output:

1,5,404920012,type1,product1,info1
4,2,10000234,type2,product2,info2
2,3,40492001223,type3,product1,info3
3,1,50492001223,type4,NOMATCH,info4

5 Comments

Thanks Håkon .. that really helped.. but is it possible that while specifying print $1,$2,$3,$4,p,$5.. I can specify range as well? for example if my file has 100 fields.. so instead of writing individual column name, we write $1-$4,p,$5-$100..?
also could you explain this - keys[a[1]]=a[2]
Thanks.. not sure how expensive loop it would be in if file is really big.. in this case it will loop every line for every column.. But anyways.. I got the really meaningful answer.. thanks
@VipinChoudhary The keys[a[1]]=a[2] builds an associative array keys from the result of the split function..
@VipinChoudhary I found a better solution, see update.. Now you do not have to use a for loop at all :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.