0

Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.

$Download = $wc.DownloadString($Link) 
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }
3
  • Slight restructure: $List = $Download -split "<a\s+" | ?{$_ -match "^href=['`"]([^'`">\s]*)"}|%{$matches[1] } Commented Mar 22, 2018 at 0:55
  • Just a reminder... Commented Mar 22, 2018 at 0:59
  • @TheMadTechnician - Thanks. It worked perfectly! Commented Mar 22, 2018 at 3:11

1 Answer 1

5

You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.

Example:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;

Or, if you need just the URLs, you can do it a bit more concisely:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. For some reason, Invoke-WebRequest does not work with my application as it consumes lot of memory

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.