Bash, Awk, Sed? Find string within file and append digits within string?

Question

I wish to search a large text file line by line, and find each entry containing "N:;;" and just simply change it to "N:07401000000;;" and then the next occurrence of "N:;;" would be changed to "N:07401000002;;" and so on throughout the complete file of entries. Here is an example below of before and after.

Before:

BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

After result would look like this:

BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

Any help or ideas would be awesome.

Do you want the N values to start at a hard-coded value and increment or just copy the value from the subsequent CELLVOICE?

Actually that is a good idea. How about the value mentioned within CELLVOICE.

WOW, actually that is a good idea.. how about the value mentions within CELLVOICE.. — Darren Didge
– Darren Didge, Commented Oct 9, 2016 at 16:24

Ed Morton · Accepted Answer · 2016-10-09 16:41:15Z

2

Here's the most robust and easily extensible way to do what you want:

$ cat tst.awk
BEGIN { RS="END:VCARD\\s*"; FS="\n"; OFS=":" }
{
    $0 = $0 gensub(/\s+$/,"",1,RT)

    for (i=1; i<=NF; i++) {
        name = gensub(/:.*/,"",1,$i)
        value = gensub(/.*:/,"",1,$i)
        n2v[name] = value
        names[i] = name
    }

    n2v["N"] = n2v["TEL;TYPE=CELLVOICE"] n2v["N"]

    for (i=1; i<=NF; i++) {
        name  = names[i]
        value = n2v[name]
        print name, value
    }
}

.

$ awk -f tst.awk file
BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

The above uses GNU awk for gensub(), multi-char RS, and RT and the basic (and idiomatic) idea is to split the input into records that end with END:VCARD and for each record first create an array (n2v[]) that maps field names (the part before the first : on each line) to their values (the part after the first :) and then you can just manipulate every field by it's name so you can trivially change values, rearrange the order, fill in defaults, etc. etc.

edited Oct 9, 2016 at 16:41

answered Oct 9, 2016 at 16:32

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Darren Didge Over a year ago

[Didge@vosser 07401]$ awk tst master07401.vcf [Didge@vosser 07401]$ tst master07401.vcf bash: tst: command not found [Didge@vosser 07401]$ bash tst master07401.vcf tst: line 1: BEGIN: command not found tst: line 1: }: command not found tst: line 3: syntax error near unexpected token (' tst: line 3: $0 = $0 gensub(/\s+$/,"",1,RT)' [Didge@vosser 07401]$ Am I not running the script correctly?

Darren Didge Over a year ago

solved it, Thanks very much ed morton, I was not reading it correctly.. my bad and am sorry for the RTFM comes to mind and you are more then welcome to slap me with a wet trout :D

chthonicdaemon · Accepted Answer · 2016-10-09 17:32:48Z

0

Ed Morton's answer is comprehensive, but if you're looking for a quick fix, this does the job as well:

$ awk 'BEGIN {N=07401000000} /N:;;/ {print "N:"N++";;"; next} 1' myfile.vcf

edited Oct 9, 2016 at 17:32

answered Oct 9, 2016 at 16:47

chthonicdaemon

19.9k2 gold badges55 silver badges70 bronze badges

2 Comments

Darren Didge Over a year ago

How do I go about running this script against to file I wish to edit and have changed? I have tried script.sh myfile.vcf

chthonicdaemon Over a year ago

You do awk -f script.awk myfile.vcf

Sundeep · Accepted Answer · 2016-10-10 02:39:15Z

Assuming the lines with N:;; and CELLVOICE occur next to each other as shown in sample, here is a solution with sed

$ sed -E '/N:;;/{N;s/.*\n(.*CELLVOICE:([0-9]+))/N:\2;;\n\1/}' ip.txt 
BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

If the pattern /N:;;/ matches, get next line with N
Then perform substitution as per required format, \n separates the two lines

Solution with perl, entire input file is slurped as one big string and then substitution performed

perl -0777 -pe 's/N:;;\n(.*CELLVOICE:(\d+))/N:$2;;\n$1/g' ip.txt

Collectives™ on Stack Overflow

Bash, Awk, Sed? Find string within file and append digits within string?

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related