Web scraping via Invoke-WebRequest / Invoke-RestMethod works only with static content in the target page (i.e. with the raw HTML source code).[1]
To support extracting content that gets loaded dynamically, via JavaScript, you need a full web browser that you can control programmatically.
As you've discovered yourself, Chromium-based browsers do offer a CLI method of outputting the dynamically generated / augmented HTML, as it would render in interactively in a browser, using the --headless and --dump-dom options.
You can capture this HTML in a variable and then process it via an HTML parser such as provided by the AngleSharp .NET library, as offered via the PSParseHTML module, which also wraps the HTML Agility Pack (which is used by default).
The following is self-contained sample code:
It assumes that you're running Windows 11 with the modern, Chromium-based version of Microsoft Edge, located at:
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
Alternatively, you can download a different Chromium-based browser, such as Brave, or Google Chrome, whose executable you can then find at:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
It downloads HTML from sample URL http://www.nptcstudents.co.uk/andrewg/jsweb/dynamicpages.html, which dynamically fills in various elements using client-side JavaScript, including one with the current timestamp.
It then ensures that the PSParseHTML module is installed and uses it to parse the rendered HTML, and extracts the element that was dynamically populated with the current timestamp to verify that client-side rendering was indeed performed.
# Create an 'msedge' alias for Microsoft Edge.
Set-Alias msedge 'C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe'
# Sample URL that includes dynamic content.
$url = 'http://www.nptcstudents.co.uk/andrewg/jsweb/dynamicpages.html'
# Use Microsoft Edge in headless mode to download from the URL
# and run its client-side scripts.
# Note:
# * --disable-gpu prevents any GPU-related errors from appearing in the output.
# * ... | Out-String captures all output as a *single, multiline string*
# and additionally ensures *synchronous* execution on Windows,
# which in turn enables capturing the output.
# * Since a full web browser must be launched, as a child process,
# followed by downloading and rendering a web page, this takes
# a while, especially if the browser isn't already running.
Write-Verbose -Verbose "Downloading and rendering $url..."
$dynamicHtml =
msedge --headless --dump-dom --disable-gpu $url | Out-String
# Now you can use the PSParseHTML module to parse the captured HTML.
# Install the module on demand.
if (-not (Get-Module -ErrorAction Ignore -ListAvailable PSParseHTML)) {
Write-Verbose "Installing PSParseHTML module for the current user..."
Install-Module -ErrorAction Stop -Scope CurrentUser PSParseHTML
}
# Parse the HTML.
Write-Verbose -Verbose "Parsing the rendered HTML..."
$parsedHtml = ConvertFrom-Html -Engine AngleSharp -Content $dynamicHtml
# Now extract the dynamically populated element to verify that it contains the current timestamp.
Write-Verbose -Verbose "Extracting a dynamically populated element..."
$parsedHtml.QuerySelectorAll('div.exampleblock')[1].InnerHtml
The above should print something like (note the timestamp):
<script type="text/javascript">
document.write("The date is " + Date());
</script>The date is Tue Jan 16 2024 23:48:55 GMT-0500 (Eastern Standard Time)
[1] In the legacy, Windows-only, ships-with-Window Windows PowerShell edition, Invoke-WebRequest by default (unless -UseBasicParsing is passed) does return dynamically generated HTML, by using the obsolete Internet Explorer engine behind the scenes - see this answer for an example.
In PowerShell (Core) 7+, the modern, cross-platform, install-on-demand edition, -UseBasicParsing is invariably implied, meaning that the raw HTML source code is only ever downloaded.
However, as of Windows 11, you can still emulate the Windows PowerShell behavior via the InternetExplorer.Application COM object; here's a minimal example:
$ie = New-Object -ComObject InternetExplorer.Application; $ie.Navigate2('https://example.org'); while ($ie.Busy) { Start-Sleep -Milliseconds 200 }; $ie.Document.getElementsByTagName('p') | ForEach-Object outerText
Either way, the obsolete status of Internet Explorer makes such solutions increasingly unusable, and
an PowerShell-external dynamic-HTM-loading and HTML-parsing solutions are needed.