I am trying to extract html from a text file. Here's what it looks like:
<html>
<cool class="what"> idk1 </cool>
<lame id="hm">idc1 </lame>
<lame id="hm"> idc2 </lame>
<lame id="hm"> idc3</lame>
<lame id="hm"> idc4 </lame>
<confused id="allTheTime"> abc1 </confused>
<cool class="what"> idk2 </cool>
<lame id="hm"> </lame>
<lame id="hm"> idc2 </lame>
<lame id="hm"> </lame>
<lame id="hm"> idc4 </lame>
<confused id="allTheTime"> abc2 </confused>
<cool class="what"> idk3 </cool>
<confused id="allTheTime"> abc3 </confused>
</html>
Below is my code:
$html = Get-Content -path 'C:\Users\bob\Desktop\tester.txt' -Raw
$wantedData1 = ($html | select-string '(?<=<cool class="what">\s+)(.*?)(?=\s+</cool>)' -allMatches | foreach {$_.Matches} | Foreach {$_.Value})
$wantedData2 = ($html | select-string '(?<=<lame id="hm">\s+)(.*?)(?=\s+</lame>)' -allMatches | foreach {$_.Matches} | Foreach {$_.Value})
$wantedData3 = ($html | select-string '(?<=<confused id="allTheTime">\s+)(.*?)(?=\s+</confused>)' -allMatches | foreach {$_.Matches} | Foreach {$_.Value})
write-host $wantedData1
write-host $wantedData2
write-host $wantedData3
The output looks like this:
idk1 idk2 idk3
idc2 idc4 idc2 idc4
abc1 abc2 abc3
I am trying to write something thats gives me an output like this:
idk1
idc1
idc2
idc3
idc4
abc1
idk2
idc2
idc4
abc2
idk3
abc3
The data for the <cool> and <confused> tag occur one time for each iteration but the values of the <lame> tag may not exist or there may be between 1 to 5 <lame> tags. I mention this because one of my other queries would break if the tag was null. Any help would be greatly appreciated. Thanks.