1

I have an input string like below:

   VAL:1|b:2|c:3|VAL:<har:[email protected]>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321

Like this, there are more than 1000 rows.

I want to fetch the value of the first parameter, i.e., the a field and d field, but for the d field I want only har:[email protected].

I tried like this:

cat $filename | grep -v Orig |sed -e 's/['a:','d:']//g' |awk -F'|' -v OFS=',' '{print $1 "," $4}' >> $NGW_DATA_FILE

The output I got is below:

1,<[email protected]>; tag=vy6r5BpcvQ

I want it like this,

1,har:[email protected]

Where did I make the mistake and how do I solve it?

2
  • 1
    If you are using Awk anyway, do all of this processing in Awk. Commented Dec 11, 2020 at 7:41
  • sometime I may receive value in a same field name like val: , instead of a and d, then how can I fetch its values Commented Dec 11, 2020 at 7:44

3 Answers 3

5

EDIT: As per OP's change of Input_file and OP's comments, adding following now.

awk '
BEGIN{ FS="|"; OFS="," }
{
  sub(/[^:]*:/,"",$1)
  gsub(/^[^<]*|; .*/,"",$4)
  gsub(/^<|>$/,"",$4)
  print $1,$4
}'  Input_file


With shown samples, could you please try following, written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS="|"
  OFS=","
}
{
  val=""
  for(i=1;i<=NF;i++){
    split($i,arr,":")
    if(arr[1]=="a" || arr[1]=="d"){
      gsub(/^[^:]*:|; .*/,"",$i)
      gsub(/^<|>$/,"",$i)
      val=(val?val OFS:"")$i
    }
  }
  print val
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                ##Starting awk program from here.
BEGIN{                               ##Starting BEGIN section of this program from here.
  FS="|"                             ##Setting FS as pipe here.
  OFS=","                            ##Setting OFS as comma here.
}
{
  val=""                             ##Nullify val here(to avoid conflicts of its value later).
  for(i=1;i<=NF;i++){                ##Traversing through all fields here
    split($i,arr,":")                ##Splitting current field into arr with delimiter by :
    if(arr[1]=="a" || arr[1]=="d"){  ##Checking condition if first element of arr is either a OR d
      gsub(/^[^:]*:|; .*/,"",$i)     ##Globally substituting from starting till 1st occurrence of colon OR from semi colon to everything with NULL in $i.
      val=(val?val OFS:"")$i         ##Creating variable val which has current field value and keep adding in it.
    }
  }
  print val                          ##printing val here.
}
' Input_file                         ##Mentioning Input_file name here. 
Sign up to request clarification or add additional context in comments.

9 Comments

Hi Sir, thanks for the answer it works for the case, I have one doubt what if instead of A and D, I have got in the same field name like both the values are present in D:1|D:<har:[email protected]>; tag=vy6r5BpcvQ like this then what I have to alter , as the file may contain huge data
@mark, sorry but this is not clear, please do give more clear example on this one.
ok, if the case is like this input : -- a:1|b:2|c:3|a:<har:[email protected]>; tag=vy6r5BpcvQ| if the key is 'a' for both the first and 4 th entry
@mark, ok so what will be the condition then to print these values then? Kindly confirm once.
Yes it worked, I have made small change gsub(/^[^<]*|; .*/,"",$1) sub(/[^:]*:/,"",$4) it worked with this one also, Thank you sir
|
4

You may also try this AWK script:

cat file

VAL:1|b:2|c:3|VAL:<har:[email protected]>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321

awk -F '[|;]' '{
   s=""
   for (i=1; i<=NF; ++i)
      if ($i ~ /^VAL:/) {
         gsub(/^[^:]+:|[<>]*/, "", $i)
         s = (s == "" ? "" : s "," ) $i
      }
   print s
}' file

1,har:[email protected]

2 Comments

Hi sir, thanks for the solution, here accessing the a and d fields and do the filtering, I have one doubt what if instead of 2 different fields a and d , the input changed to a:1|b:2|c:3|a:<har:[email protected]>; tag=vy6r5BpcvQ like this then how to filter it, as there maybe multiple fields whose name start with same field like d
@mark wrt I have one doubt ... - what you have is a "question", not a "doubt". A doubt means you don't believe something you've been told, a question just means you'd like information about something. It's a common mistake in the English spoken in India, see can-doubt-sometimes-mean-question. No big deal of course, just thought you'd like to know.
3

You can do the same thing with sed rather easily using Extended Regex, two capture groups and two back-references, e.g.

sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'

Explanation

  • 's/find/replace/' standard substitution, where the find is;
  • ^[^:]*: from the beginning skip through the first ':', then
  • (\w+) capture one or more word characters ([a-zA-Z0-9_]), then
  • [^<]*[<] consume zero or more characters not a '<', then the '<', then
  • ([^>]+) capture everything not a '>', and
  • .*$ discard all remaining chars in line, then the replace is
  • \1,\2 reinsert the captured groups separated by a comma.

Example Use/Output

$ echo 'a:1|b:2|c:3|d:<har:[email protected]>; tag=vy6r5BpcvQ|' | 
sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'
1,har:[email protected]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.