0

I have started my journey with PowerShell recently and I have already got stuck on this issue for a few days. Basically I have a script that one of the variables contains a html block with random text and from that text I need extract certain strings and put into another variables.

Here is the example:

$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n  Field1: Field1Value3\n</div>\n"

And this is what I need to pass further

$Field1 = "Field1Value1, Field1Value2, Field1Value3" //or in a list if possible
$Field2 = "Field2Value1"

There are more fields to extract but the idea is the same. I was able to get this working using the below function, but that only work for unique values.

function GetStringBetweenTwoStrings($firstString, $secondString, $Text){

    #Regex pattern to compare two strings
    $pattern = "$firstString(.*?)$secondString"

    #Perform the opperation
    $result = [regex]::Match($Text,$pattern).Groups[1].Value

    #Return result
    return $result

}

EDIT: Here is my latest try. For some reason the \n was not working in some cases so I am now replacing everying as @@@. Still only first value of each field is printed using the below code.

$originalString = "<div>Field1: Field1Value1\n</div><div>Field2: Field2Value1</div>\n<div>Field1: Field1Value2\n Field3: Field3Value1<br /></div>"
$formattedString = $originalString
$hash = @{}
$hash.'\n' = ' @@@'
$hash.'\t' = ' @@@'
$hash.'<br />' = ' @@@'
$hash.'</div>' = ' @@@'
foreach ($key in $hash.Keys) {
   $formattedString = $formattedString.Replace($key, $hash.$key)
   }

function GetStringBetweenTwoStrings($firstString, $secondString, $string){

    #Regex pattern to compare two strings
    $pattern = "$firstString(.*?)$secondString"

    #Perform the opperation
    $result = [regex]::Match($string,$pattern).Groups[1].Value

    #Return result
    return $result

}
$field1 = GetStringBetweenTwoStrings -firstString "Field1: " -secondString " @@@" $formattedString
$field2 = GetStringBetweenTwoStrings -firstString "Field2: " -secondString " @@@" $formattedString
$field3 = GetStringBetweenTwoStrings -firstString "Field3: " -secondString " @@@" $formattedString
2
  • Is that the literal text in the variable or did you add the \n in there while posting? Please show us your script so we can see how you obtain the variable. Commented Oct 8, 2020 at 9:24
  • Yes, \n is the last string, sometimes I saw <br /> or \t but I am unifying this with this $tempText = $text $hash = @{} $hash.'\t' = '\n' $hash.'<br />' = '\n' foreach ($key in $hash.Keys) { $tempText = $text.Replace($key, $hash.$key) } Commented Oct 8, 2020 at 9:29

1 Answer 1

1

I agree with Paolo that you should normally use a HTML parser for this, but since it is unclear how you got the html in variable $text, I would suggest you try

$text = "<div>\n Field1: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Field2: Field2Value1\n </div> \n Field1: Field1Value2\n </div> \n <div>\n  Field1: Field1Value3\n</div>\n"

[regex]::Matches($text,"Field\d+:\s[^\\<]+").Value | Group-Object {($_ -split ':')[0].Trim()} | ForEach-Object {
    $value = foreach ($val in $_.Group) { ($val -split ':', 2)[-1].Trim() }
    Remove-Variable $_.Name -ErrorAction SilentlyContinue
    New-Variable -Name $_.Name -Value $value
}

$Field1 now holds an array with values

Field1Value1
Field1Value2
Field1Value3

$Field2 contains a single string with value

Field2Value1

Edit

Looking at your last comment, where you say Field1 is actually something like First Name, the code should be quite different.. (Why didn't you show us that in the first place??)

$text = "<div>\n First Name: Field1Value1\n </div> \n <div>\n <div>\n <div>\n Last Name: Field2Value1\n </div> \n First Name: Field1Value2\n </div> \n <div>\n  First Name: Field1Value3\n</div>\n"

$hash = @{}
# replace all tags and \n, \t, \f, \v in the string with two (or more) spaces, then split on those
($text -replace '</?[a-z][a-z0-9]*[^<>]*>|\\[nrtfv]', '  ').Trim() -split '\s{2,}' | ForEach-Object {
    # split the name and the value 
    $name, $value = ($_ -split ':', 2).Trim()
    $name = $name -replace '\s'  # take out spaces because they do not belong in a variable name
    # if the hash already has an element with this name, combine the value as array
    if ($hash.ContainsKey($name)) { $hash[$name] = @($hash[$name]) + $value }
    else { $hash[$name] = $value }
}

You can leave the values in the hash if I were you and use them as $hash.FirstName etc.

# show what is inside the hash:
$hash
Name                           Value                                                                                                                  
----                           -----                                                                                                                  
FirstName                      {Field1Value1, Field1Value2, Field1Value3}
LastName                       Field2Value1

But if you must create separate variables of it, you can do

$hash.GetEnumerator() | ForEach-Object {
    Remove-Variable $_.Name -ErrorAction SilentlyContinue
    New-Variable -Name $_.Name -Value $_.Value        
}

$FirstName now holds an array with values

Field1Value1
Field1Value2
Field1Value3

$LastName holds a single string with value

Field2Value1
Sign up to request clarification or add additional context in comments.

2 Comments

Nice, this does the same as my code but in more succinct and elegant way
Apologies for confusing you guys, I would never thought people would receive the example in a wrong way. Anyway thank you both for sharing your help and @Theo - your code works great in my case!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.