I am new at PowerShell and have not found a Stack Overflow question or a documentation reference that gets me all the way to a successful outcome. If a question or documentation reference already exists that answers this that I overlooked I would be grateful to know.
In a text file is a string like this:
<span><span><span><span><span></span></span></span></span></span>
The number of <span> and </span> varies from file to file. For example, in some files it is like this:
<span></span>
Yet in others it is like this:
<span><span></span></span>
And so on. There are likely never going to be more than 24 of each in a string.
I want to eliminate all strings like this in the text file, yet preserve the </span> in strings like this:
<span style="font-weight:bold;">text</span>
There may be many variations on that kind of string in the text file; for example, <span style="font-size: 10px; font-weight: 400;">text</span> or <span style="font-size: 10px; font-weight: 400;">text</span> and I don't know beforehand what variation(s) will be included in the text file.
This partially works...
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span>', '' `
-replace '</span>', ''
} | Set-Content $destination_file
...but obviously results in something like <span style="font-weight:bold;">text.
In the PowerShell script above I can use
$_ -replace '<span></span>', '' `
But of course it only catches the <span></span> in the middle of the string because, as it is written now, it does not loop.
I know it is silly to do something like this
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', ''
} | Set-Content $destination_file
So because the <span> string collapses into itself each time the script is run, producing a new inner <span></span> that can then be removed, the best solution I can think of is to loop the script over the file until it recognizes that all instances of <span></span> are gone.
I feel like adding logic along these lines is necessary:
foreach($i in 1..24){
Write-Host $i
But have not been able to successfully incorporate it into the script.
If this is the wrong approach entirely I would be grateful to know.
The reason for PowerShell is that my team prefers it for scripts included in an Azure DevOps release pipeline.
Thanks for any ideas or help.