1

I need to match the a pattern i.e. "Commodity Name" and get the string in the next line between the patterns "<dd>" "</dd>".

Sample Input file:

C:\Users\rpm\Desktop\sample.txt:133:    <dt>Commodity Name</dt>
C:\Users\rpm\Desktop\sample.txt:134:    <dd>Grocery</dd>
C:\Users\rpm\Desktop\sample.txt:136:    <dt>IP address</dt>
C:\Users\rpm\Desktop\sample.txt:137:    <dd>XXX.XXX.XXX.XXX port 8000</dd>
C:\Users\rpm\Desktop\sample.txt:144:    <dt>Commodity Serial #</dt>
C:\Users\rpm\Desktop\sample.txt:145:    <dd>0055500000</dd>
C:\Users\rpm\Desktop\sample.txt:147:    <dt>Client IP</dt>
C:\Users\rpm\Desktop\sample.txt:148:    <dd>xxx.xxx.xxx.xxx</dd>
C:\Users\rpm\Desktop\sample.txt:150:    <dt>Client Logged In As</dt>
C:\Users\rpm\Desktop\sample.txt:151:    <dd>rpm123</dd>
C:\Users\rpm\Desktop\sample.txt:153:    <dt>User is member of</dt>
C:\Users\rpm\Desktop\sample.txt:154:    <dd>BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user</dd>

Need to match patterns such as

  • Commodity Name
  • IP address
  • Commodity Serial #
  • Client IP
  • Client Logged In As
  • User is member of

and get the values in the next line of the matched patterns between the tags <dd> & </dd>.

Desired output:

Grocery | XXX.XXX.XXX.XXX port 8000 | 0055500000 | xxx.xxx.xxx.xxx | rpm123 | BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user
0

1 Answer 1

3

I would start to create an array defining your keywords:

$keywords = @(
    '<dt>Commodity Name</dt>'
    '<dt>IP address</dt>'
    '<dt>Commodity Serial #</dt>'
    '<dt>Client IP</dt>'
    '<dt>Client Logged In As</dt>'
    '<dt>User is member of</dt>'
)

Now you can join the keywords by an | to use it with the Select-String cmdlet:

$file = 'C:\Users\rpm\Desktop\sample.txt'
$content = Get-Content $file
$content | Select-String -Pattern ($keywords -join '|')

This will give you the line number of each matched keyword. Now you can iterate over the result, access the next line by index and crop the <dd> pre and </dd> postifx:

ForEach-Object {
        [regex]::Match($content[$_.LineNumber], '<dd>(.+)</dd>').Groups[1].Value
    }

Regex:

Regular expression visualization

Output:

Grocery
XXX.XXX.XXX.XXX port 8000
0055500000
xxx.xxx.xxx.xxx
rpm123
BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user

Finally you have to join the result by | to get the desired output. Here is the whole script:

$keywords = @(
    '<dt>Commodity Name</dt>'
    '<dt>IP address</dt>'
    '<dt>Commodity Serial #</dt>'
    '<dt>Client IP</dt>'
    '<dt>Client Logged In As</dt>'
    '<dt>User is member of</dt>'
)

$file = 'C:\Users\rpm\Desktop\sample.txt'
$content = Get-Content $file

($content | Select-String -Pattern ($keywords -join '|') | 
    ForEach-Object {
        [regex]::Match($content[$_.LineNumber], '<dd>(.+)</dd>').Groups[1].Value
    }) -join ' | '

Output:

Grocery | XXX.XXX.XXX.XXX port 8000 | 0055500000 | xxx.xxx.xxx.xxx | rpm123 | BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user
Sign up to request clarification or add additional context in comments.

5 Comments

In between is there a motive behind the match using tags as well i.e. <dt>Commodity Name</dt> instead of only Commodity Name.
Just to ensure you don't match a <dt> Tag containg the string
Can this same be achieved in vbscript since in some machines in our environment running a powershell script disabled. Instead vbscript is used in such machines.
Im sure it can but im not familiar with vb so you probably have to do it yourself.
Ok...Thanks again :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.