0

I have the following HTML table Link To the HTML

I want to parse it and convert it to XML/CSV/PS Object, I tried to do with HtmlAgilityPack.dll but no success. Can anybody give me any directions to do it?


I want to convert the table to a PSObject and export it to csv, I currently have just the beginning of the code, and access to the lines but i can't access to the values in the lines

Add-Type -Path C:\Windows\system32\HtmlAgilityPack.dll
$HTML = New-Object HtmlAgilityPack.HtmlDocument
$res = $HTML.Load("C:\Test\Test.html")
$table = $HTML.DocumentNode.SelectNodes("//table/tr/td/nobr")

when i access to $table[0..47].InnerHtml i get only the first ** column ** of the file, i can't access to the 2nd and etc

Thanks Ohad

3
  • What exactly did you try? E.g. we'd like to see code, error messages or anything actionable. HTML Agility Pack doesn't yield XML objects but rather its own structure that mimics the XML DOM tree. Keep in mind that HTML is often not XML. Why do you desperately need XML here? Commented Jan 24, 2013 at 8:36
  • P.S: i need to convert it to XML or CSV even a text can be helpfull for me Commented Jan 24, 2013 at 9:15
  • If you get your HTML via Invoke-WebRequest it gives a parsedHTML property using which you can traverse the DOM and convert to the format you wish to Commented Jan 14, 2018 at 15:16

1 Answer 1

3

you can try this to get all the html in <nobr> tags. I let you find the logic to output what you want...

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("http://urltoyourfile.html")
$doc = $ie.Document
($doc.getElementsByTagName("nobr"))|%{$_.innerHTML}

Output :

Lead User&nbsp;&nbsp;
Accesses&nbsp;&nbsp;
Last Accessed&nbsp;&nbsp;
Average&nbsp;&nbsp;
Max&nbsp;&nbsp;
Min&nbsp;&nbsp;
Total&nbsp;&nbsp;
amirt</NO br>
2
01/20/2013 09:40:47
04:18:17
06:19:26
02:17:09
08:36:35
andream
1
01/20/2013 10:33:01
02:34:37
02:34:37
02:34:37
02:34:37
avnerm
1
01/17/2013 11:34:16
00:30:44
00:30:44
00:30:44
00:30:44
brouria

a way to parse it :

($doc.getElementsByTagName("nobr"))|%{
    write-host -nonew $_.innerHTML";"
    $cpt++
    if ($cpt % 8 -eq 0){$cpt=1;write-host ""}
}
Sign up to request clarification or add additional context in comments.

4 Comments

Unfortunately it open the IE and not giving any output to powershell
maybe because i have my file localy? win7? IE9?
nope i've tried with UNC and works too : $ie.navigate("\\server\test\test.html")
Great it was UAC... Just need to run as admin the powershell :) Thanks i'll try to parse it now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.