1

Powershell:

Help: I want to truncate this url:

https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html

into this:

 merry-christmas-and-still-merry-shes-still-so-beautiful

and store it into a variable.

I tried wildcards but wouldn't work. I tried replacing "/" with newline "`n" and tried reading the last line. Also I successfully tried the code below. The code that i used and worked is:

$rightPart="https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html"
$rightPart=$rightPart.Replace(".html","")
while($rightPart -imatch "/"){
$pos = $rightPart.IndexOf("/")
$rightPart = $rightPart.Substring($pos+1)
}
Write-Output "String is: $rightPart"

But i want a better way. Thanks for your help in advance.

0

5 Answers 5

3

Instead of doing string parsing acrobatics, you can interrogate the URL with the URI Class in combination with using the FileInfo constructor to get the basename of the document.

$url = "https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html"

([IO.FileInfo]([System.Uri]$url).Segments[-1]).BaseName

What is nice about this is that it will get the last filename in the URL no matter if it ends with .htm/.html/.asp/.aspx/etc and whether you have one slash "/" or 20.

Another way is to use Split-Path and grab the leaf object in combination with grabbing the BaseName of the FileInfo object.

([IO.FileInfo](Split-Path $url -Leaf)).BaseName
Sign up to request clarification or add additional context in comments.

1 Comment

That's certainly beautiful. That came out of nowhere, yet so elegant. I never thought it could be done. Now I know it could be done. Thank you for this beautiful tip.
1

you mean something like this?

$url = 'https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html'
$arr = $url -split '/'
$truncatedVar = ($arr[$arr.Length-1]).Substring(0, $arr[$arr.Length-1].IndexOf('.'))

2 Comments

Yup, that was it. That was definitely shorter code. Thank you very much.
Else you can do it too : $arr=($url -split '/')[-1] ; $arr=$arr.Substring(0, $arr.IndexOf('.'))
1

You can also use regular expressions with the -replace operator.

$url = 'https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html'
$truncatedVar = $url -replace ".*/(.*)\.html",'$1'

.*/ matches zero or more of any character up to the last slash

(.*)\.html matches zero or more of any character up to the .html string. The parenthesis cause all matched text to be captured to a variable.

'$1' is the second argument to the -replace operator, telling it what to replace the matched text with. In this case $1 evaluates to the text that was captured by (.*)

/edited to fix the double-quotes, also escaped the '.' in .html

5 Comments

i just ran the code. It looks like the whole $url is getting replaced with a null value. $1 seems to contain nothing.
Sorry, I accidentally replaced single-quotes with double-quotes on the '$1' parameter. When in double-quotes, powershell tries to evaluate the $1 variable within the double-quotes, which does, indeed, result in nothing.
Ok. That solved it. I was still looking for where it went wrong. This format is still so much shorter than all. Thank you.
Regular expressions are worth learning. Many languages have native support for them.
I agree. I like the wildcard formatting for the string. Thats very special. Again, thank you. :)
1

Just do it:

$URL="https://thoudamchitaranjan.blogspot.in/2017/12/merry-christmas-and-still-merry-shes-still-so-beautiful.html"
[System.IO.Path]::GetFileNameWithoutExtension($URL)

2 Comments

Thank you. This definitely is useful for my other problem when I want to get the file name of a file. However, for the current problem, it returns: "merry-christmas-and-still-merry-shes-still-so-beautiful.html" hence, I ll have to truncate the ".html" in a different statement.
It works. I really didn't think that IO.Path will allow me to work with urls, but it does work, and its a wonder i guess. Maybe, its because linux uses front-slashes for paths. Its a neat cheat. Thanks again.
1

To complement Ricc Babbitt's helpful answer:

PowerShell Core - but, unfortunately, not Windows PowerShell - supports Split-Path -LeafBase, which allows extraction of the base filename (the filename without extension) in a single operation:

# PowerShell *Core* only
PS> Split-Path -LeafBase "https://example.org/shes-still-so-beautiful.html"
shes-still-so-beautiful

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.