3

I wish to search a large text file line by line, and find each entry containing "N:;;" and just simply change it to "N:07401000000;;" and then the next occurrence of "N:;;" would be changed to "N:07401000002;;" and so on throughout the complete file of entries. Here is an example below of before and after.

Before:

BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

After result would look like this:

BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

Any help or ideas would be awesome.

Do you want the N values to start at a hard-coded value and increment or just copy the value from the subsequent CELLVOICE?

Actually that is a good idea. How about the value mentioned within CELLVOICE.

1
  • WOW, actually that is a good idea.. how about the value mentions within CELLVOICE.. Commented Oct 9, 2016 at 16:24

3 Answers 3

2

Here's the most robust and easily extensible way to do what you want:

$ cat tst.awk
BEGIN { RS="END:VCARD\\s*"; FS="\n"; OFS=":" }
{
    $0 = $0 gensub(/\s+$/,"",1,RT)

    for (i=1; i<=NF; i++) {
        name = gensub(/:.*/,"",1,$i)
        value = gensub(/.*:/,"",1,$i)
        n2v[name] = value
        names[i] = name
    }

    n2v["N"] = n2v["TEL;TYPE=CELLVOICE"] n2v["N"]

    for (i=1; i<=NF; i++) {
        name  = names[i]
        value = n2v[name]
        print name, value
    }
}

.

$ awk -f tst.awk file
BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD

The above uses GNU awk for gensub(), multi-char RS, and RT and the basic (and idiomatic) idea is to split the input into records that end with END:VCARD and for each record first create an array (n2v[]) that maps field names (the part before the first : on each line) to their values (the part after the first :) and then you can just manipulate every field by it's name so you can trivially change values, rearrange the order, fill in defaults, etc. etc.

Sign up to request clarification or add additional context in comments.

2 Comments

[Didge@vosser 07401]$ awk tst master07401.vcf [Didge@vosser 07401]$ tst master07401.vcf bash: tst: command not found [Didge@vosser 07401]$ bash tst master07401.vcf tst: line 1: BEGIN: command not found tst: line 1: }: command not found tst: line 3: syntax error near unexpected token (' tst: line 3: $0 = $0 gensub(/\s+$/,"",1,RT)' [Didge@vosser 07401]$ Am I not running the script correctly?
solved it, Thanks very much ed morton, I was not reading it correctly.. my bad and am sorry for the RTFM comes to mind and you are more then welcome to slap me with a wet trout :D
0

Ed Morton's answer is comprehensive, but if you're looking for a quick fix, this does the job as well:

$ awk 'BEGIN {N=07401000000} /N:;;/ {print "N:"N++";;"; next} 1' myfile.vcf

2 Comments

How do I go about running this script against to file I wish to edit and have changed? I have tried script.sh myfile.vcf
You do awk -f script.awk myfile.vcf
0

Assuming the lines with N:;; and CELLVOICE occur next to each other as shown in sample, here is a solution with sed

$ sed -E '/N:;;/{N;s/.*\n(.*CELLVOICE:([0-9]+))/N:\2;;\n\1/}' ip.txt 
BEGIN:VCARD
VERSION:2.1
N:07401000000;;
TEL;TYPE=CELLVOICE:07401000000
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000001;;
TEL;TYPE=CELLVOICE:07401000001
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000002;;
TEL;TYPE=CELLVOICE:07401000002
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:07401000003;;
TEL;TYPE=CELLVOICE:07401000003
END:VCARD
  • If the pattern /N:;;/ matches, get next line with N
  • Then perform substitution as per required format, \n separates the two lines


Solution with perl, entire input file is slurped as one big string and then substitution performed

perl -0777 -pe 's/N:;;\n(.*CELLVOICE:(\d+))/N:$2;;\n$1/g' ip.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.