2

I have this string (hundreds of them actually) containing URLs and I would like to update them.

Here's the old URL format
http://oldDomain/a/b/document.aspx?p1=v1&p2=NEEDED_VALUE&morePsHere=moreVsHere

and here's what I need them to look like after the update
http://newDomain/c/d/NEEDED_VALUE

Pretty much all I needed to do was to extract the value of p2 in the old URL and append it to http://newDomain/c/d/ to create the new URL.

I assumed the string I was going to get would look like this:

$s = "http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere,
      http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere,
      http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere"

and I was able to update it using the following:

$newURLStart = "http://newDomain/c/d/"
$newStr = $null
$s.Split(",") | ForEach {
  if ($_.IndexOf("p2=") -ne 1)
  {
    $neededValue = $_.Substring($_.IndexOf("p2=")+3)
    if ($neededValue.IndexOf("&") -ne -1)
    {
      $neededValue = $neededValue.Substring(0,$neededValue.IndexOf("&"))
    }
    $newStr = $newStr + ", " + $newURLStart + $neededValue
  }
}
$newStr = $newStr.TrimStart(", ")
$s = $newStr

BUT, it turns out that the string I'm going to get isn't plaintext and would actually look something like:

$s = '<div class="someClass"><p>SomeText</p><ul>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=001&amp;morePsHere=moreVsHere">LINK ONE</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=002&amp;morePsHere=moreVsHere">LINK TWO</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=003&amp;morePsHere=moreVsHere">LINK THREE</a></li>
      </ul></div>'

This is a bit more complex than my comma-delimited expectations! I need help updating my script to accommodate the fact. I'm thinking regex might come into play here to grab the URLs inside the href but I'm pretty noob when it comes to that.

3 Answers 3

1

If you threw all the strings in a file you could do something like so:

Get-Content "testregex.html" | % {$_ -replace 'href=".+?;.+?=(.+?)&amp;(.+?)"', 'href="http://newdomain/c/$1"'} | Set-Content "newtestregex.html"

Takes as input this file:

<div class="someClass"><p>SomeText</p><ul>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=001&amp;morePsHere=moreVsHere">LINK ONE</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=002&amp;morePsHere=moreVsHere">LINK TWO</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=003&amp;morePsHere=moreVsHere">LINK THREE</a></li>
      </ul></div>

Yields:

<div class="someClass"><p>SomeText</p><ul>
      <li><a href="http://newdomain/c/001">LINK ONE</a></li>
      <li><a href="http://newdomain/c/002">LINK TWO</a></li>
      <li><a href="http://newdomain/c/003">LINK THREE</a></li>
      </ul></div>
Sign up to request clarification or add additional context in comments.

2 Comments

Great solution as this doesn't depend on the structure of the input itself! I'm taking this one. Thanks!
I've modified the first parameter of -replace to 'href=".+?p2=(.+?)&amp;(.+?)"' in the event that the parameters aren't in order.
1

You can make this a bit easier by using Powershell's excellent XML capabilities. First, convert your string into xml: $xmlData = [xml] $s. Now, we can simply navigate it using properties: $xmlData.div.ul.li.a.href will go into the html you got, and automatically expand into collections as needed:

PS C:\Users\carlpett> $xmlData.div.ul.li.a.href
http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere
http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere
http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere

Now, it's just a simple regex to do the actual replacement: $xmlData.div.ul.li.a.href -replace 'http:\/\/oldDomain\/.+p2=([^&]+).+','http://newDomain/c/d/$1'

So, wrapping it up:

$xmlData = [xml] $s
$xmlData.div.ul.li.a.href -replace 'http:\/\/oldDomain\/.+p2=([^&]+).+','http://newDomain/c/d/$1'

2 Comments

I like this idea, but the HTML string input is variable, and there could be nested uls in it. Nonetheless, I'll try making use of this [xml] trick more often. Thanks!
@kei: It is a nice trick indeed. You could actually just remove all that niceness and run the -replace over $s instead, if you want something more general.
1

I simplified your input somewhat, but here it is. (BTW please please store this regex in a post-it next to your desk - it helps me again and again! :) )

I make the following assumptions:

  • that the input URL is present only within
  • tags
  • that the URI always contains the arguments (p1 and p2)

Code:

# Heres the input. 
# I assume you can figure out how to extract the <li> tags from your input

$ip = '<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=001&amp;morePsHere=moreVsHere">LINK ONE</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=002&amp;morePsHere=moreVsHere">LINK TWO</a></li>
      <li><a href="http://oldDomain/a/b/document.aspx?p1=v1&amp;p2=003&amp;morePsHere=moreVsHere">LINK THREE</a></li>
'

# loop through each line.
$ip -split "`n" | foreach {

        $_ -match "(?<=p2=).*(?=&amp;)"
        $matches
        # now insert the logic to put the regex match into your destination URL
} 

More info on the regex used (and a web result):

  • The -match operator puts the regex match in a variable called $matches.
  • In above code, $matches is updated in each line of the string.
  • The (?<=p2=) and (?=&amp;) tell Powershell that it should look for a match that is bounded by the expressions p2= and &amp;. In this case its your match.

Heres the output for $match

Name                           Value
----                           -----
0                              001
0                              002
0                              003
0                              003

1 Comment

This idea works too, but again with the input string being variable, it will take more effort in looking for the lis and putting it all back together. Sticky-ing the regex though :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.