1

I am trying to access the document of an internet explorer com object with windows 2012. The code works great in windows 2008 but as soon as I try to run it on windows 2012 (fresh install, tried on more than one server), the same code stops working. In other words, $ie.document.documentHtml returns as null.

Below is the code:

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate2("http://www.example.com/") 
while($ie.busy) {start-sleep 1}
$ie.document.documentHtml.innerhtml

Has the interexplorer com object changed in windows 2012? and if yes, how do I do I retrieve the document contents in windows 2012?

Thanks in advance

edit: Added a bounty to sweeten things up. Invoke-WebRequest is nice but it works only on windows 2012 but I need to use internet explorer and have it work both on windows 2008 and windows 2012. I have read somewhere that installing microsoft office solves the issue. It is not an option either.

edit2: as I need to remotely invoke the script on multiple windows server (both 2008 and 2012), I would prefer not to copy files manually

3
  • What do you mean it "stops working?" Do you get an error message? What result are you expecting, and what result are you receiving? Which line is it failing on? Commented Jan 17, 2014 at 22:36
  • I meant the code does not work. In other words, $ie.document.innerhtml is empty. Interestingly, I can make the browser visible with $ie.visible=$true and it shows that the browser has navigated to the right page but I can not access the actual page contents Commented Jan 17, 2014 at 22:48
  • From a search over the web, it seems I am not the only one having this issue and it seems that in some cases, installing office 2010 solves the issue - this is not an option for me Commented Jan 17, 2014 at 22:49

4 Answers 4

3
+50

It's a know bug: http://connect.microsoft.com/PowerShell/feedback/details/764756/powershell-v3-internetexplorer-application-issue

An extract from the workaround:

So, here's a workaround:

  1. Copy Microsoft.html.dll from a location where it is installed (eg: from C:\Program Files(x86)\Microsoft.NET\Primary Interop Assemblies to your script's location (can be a network drive)
  2. Use the Load-Assembly.ps1 script (code provided below and at: http://sdrv.ms/U6j7Wn) to load the assembly types in memory eg: .\Load-Assembly.ps1 -Path .\microsoft.mshtml.dll

Then proceed as usual to create the IE object etc. Warning: when dealing with the write() and writeln() methods use the backward compatible methods: IHTMLDocument2_write() and IHTMLDocument2_writeln().

Sign up to request clarification or add additional context in comments.

2 Comments

I did vote it up but as I am using powershell remotely on multiple machines, it is not practical. Thanks
Though I would have preferred a solution that did not involve copying the dll on multiple files, this answer came the closest and bounty was expiring - Thanks
2
    $ie.document.documentHtml.innerhtml

The bigger question is how this ever could have worked. The Document property returns a reference to the IHTMLDocument interface, it does not have a "documentHtml" property. It is never that clear what you might get back when you use late binding as was done in this code. There is an old documentHtml property supported by the DHTML Editing control, that has been firmly put to the pasture. Admittedly rather a wild guess.

Anyhoo, correct syntax is to use, say, the body property:

  $ie = new-object -com "InternetExplorer.Application"
  $ie.navigate2("http://www.example.com/") 
  while($ie.busy) {start-sleep 1}
  $txt = $ie.document.body.innerhtml
  Write-Output $txt

If you still have problems, Powershell does treat null references rather undiagnosably, then try running this C# code on the machine. Ought to give you a better message:

using System;

class Program {
    static void Main(string[] args) {
        try {
            var comType = Type.GetTypeFromProgID("InternetExplorer.Application");
            dynamic browser = Activator.CreateInstance(comType);
            browser.Navigate2("http://example.com");
            while (browser.Busy) System.Threading.Thread.Sleep(1);
            dynamic doc = browser.Document;
            Console.WriteLine(doc.Body.InnerHtml);
        }
        catch (Exception ex) {
            Console.WriteLine(ex.ToString());
        }
        Console.ReadLine();
    }
}

5 Comments

on my windows 2012, neither $ie.document.body nor $ie.document.body.innerhtml are available . Thanks though
That's extraordinarily bizarre, the IE object model has been around a very long time and isn't different on 2012. Fire up Regedit.exe on that machine and navigate to HKCR\InternetExplorer.Application. Quote the CLSID key value you see there. And quote the IE version from its Help + About dialog.
I appreciate the help. The IE Version 10.0.9200.16384 and the clsid is {0002DF01-0000-0000-C000-000000000046}
The interesting part is that $ie.document is not null, when I type $ie.document , it says it is System.__ComObject
That's all perfectly normal. Powershell's habit of swallowing null references without a diagnostic makes debugging problems very difficult. About to throw the towel on this one, updated the post with code that could give you a better diagnostic.
1

As far as I can tell, on Windows Server 2012 to get the full html of a page:

$ie.document.documentElement.outerhtml

There is also an innerhtml property on the documentElement, which strips off the root <html> element.

Of course, if all you want to do is get the raw markup, consider using Invoke-WebRequest:

$doc = Invoke-WebRequest 'http://www.example.com'
$doc.Content

1 Comment

I meant to write $ie.document.documentElement.innerhtml - it is empty in 2012. I will edit my post. $ie.document is System.__ComObject but typing $ie.document.documentElement does not return anything. The info about Invoke-WebRequest is interesting so I will vote up but unfortunately in my case, I need to use internet explorer.
1

Get any PC with Office installed and copy Microsoft.mshtml.dll to your script location. c:\program files (x86)\Microsoft.net\primary interop assemblies\Microsoft.mshtml.dll

add-Type -Path Microsoft.mshtml.dll

Script works.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.