5

There is a webpage I am trying to extract data from. By looking at the HTML in the page Source, I can find the data I am interested inside script tags. It looks like the following:

<html>
<script type="text/javascript">

window.gon = {};
gon.default_profile_mode = false; 
gon.user = null;  
gon.product = "shoes";
gon.books_jsonarray = [
{
    "title": "Little Sun",
    "authors": [
        "John Smith"
    ],
    edition: 2,
    year: 2009
},
{
    "title": "Little Prairie",
    "authors": [
        "John Smith"
    ],
    edition: 3,
    year: 2009
},
{
    "title": "Little World",
    "authors": [
        "John Smith",
        "Mary Neil",
        "Carla Brummer"
    ],
    edition: 3,
    year: 2014
}
];

</script>
</html>

What I would like to achieve is, call the webpage by using its url, then retrieving the 'gon' variable from JavaScript and store it in a C# variable. In other words, in C#, I would like to have a data structure (a dictionary for instance) that would hold the value of 'gon'.

I have tried researching how to get a variable defined in JavaScript via C# WebBrowser, and this is what I found:

using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Net;
using System.Runtime.InteropServices;
using System.Text.RegularExpressions;
using mshtml;

namespace Mynamespace
{

  public partial class Form1 : Form
  {
    public WebBrowser WebBrowser1 = new WebBrowser();

    private void Form1_Load(object sender, EventArgs e)
    {
        string myurl = "http://somewebsite.com"; //Using WebBrowser control to load web page   
        this.WebBrowser1.Navigate(myurl);
    }    


    private void btnGetValueFromJs_Click(object sender, EventArgs e)
    {
        var mydoc = this.WebBrowser1.Document;
        IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
        IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
        Type vWindowType = vWindow.GetType();
        object strfromJS = vWindowType.InvokeMember("mystr",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//Here, I am able to see the string "Hello Sir"

        object gonfromJS = vWindowType.InvokeMember("gon",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//Here, I am able to see the object gonfromJS as a '{System.__ComObject}'

        object gonbooksfromJS = vWindowType.InvokeMember("gon.books_jsonarray",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//This error is thrown: 'An unhandled exception of type 'System.Runtime.InteropServices.COMException' occurred in mscorlib.dll; (Exception from HRESULT: 0x80020006 (DISP_E_UNKNOWNNAME))'

    }

  }
}

I am able to retrieve values of string or number variables such as:

var mystr = "Hello Sir";
var mynbr = 8;

However, even though I am able to see that the 'gon' variable is being passed as a '{System.__ComObject}', I don't know how to parse it in order to see the values of its sub components. It would be nice if I could parse it, but if not, what I would like to have instead, is a C# Data Structure with keys/values that contains all the sub infos for the gon variable, and especially, be able to view the variable 'gon.books_jsonarray'.

Any help on how to achieve this would be very much appreciated. Note that I cannot change the source html/javascript in anyway, and so, what I need is a C# code that would allow to reach my goal.

5
  • Is gon going to have a deterministic value? Are you sure it won't populate its members from other variables, user input or AJAX requests? Commented Jan 24, 2018 at 20:48
  • 1
    Not very C# Webbrowser knowledgeable but if you can call javascript JSON.stringify(gon) it might help , then parse the json string Commented Jan 24, 2018 at 20:56
  • @charleifl : I am assuming you meant adding the line var Myjson = JSON.stringify(gon) to the javascript? Unfortunately, I cannot edit the source html/javascript at all. Commented Jan 24, 2018 at 21:07
  • @AI.G. : In this case, given a specific url, the value of 'gon' should not change Commented Jan 24, 2018 at 21:11
  • Did you try the technique given in the answers to this question? Commented Jan 24, 2018 at 21:19

2 Answers 2

1

You can cast the result of InvokeMember() to dynamic and use the property names directly in your C# code. Array indexing is tricky but can be done with another use of InvokeScript(), see my example:

private void btnGetValueFromJs_Click(object sender, EventArgs e)
{
    var mydoc = this.WebBrowser1.Document;
    IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
    IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
    Type vWindowType = vWindow.GetType();

    var gonfromJS = (dynamic)vWindowType.InvokeMember("gon",
                        BindingFlags.GetProperty, null, vWindow, new object[] { });

    var length = gonfromJS.books_jsonarray.length;

    for (var i = 0; i < length; ++i)
    {
        var book = (dynamic) mydoc.InvokeScript("eval", new object[] { "gon.books_jsonarray[" + i + "]" });
        Console.WriteLine(book.title);
        /* prints:
            * Little Sun
            * Little Prairie
            * Little World
            */
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0
  1. You need to use JSON.stringify to convert your gon.books_jsonarray variable to JSON string

  2. After you can retrive JSON using next C# code:

    var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString();

  3. After you can deserialize JSON to object using Newtonsoft.Json

My full code is here:

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Windows.Forms;

namespace WindowsFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            var webBrowser = new WebBrowser();

            webBrowser.DocumentCompleted += (s, ea) =>
            {
                var mydoc = webBrowser.Document;
                var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString();
                var gonObject = JsonConvert.DeserializeObject<List<Books>>(gonFromJS);
            };

            var myurl = "http://localhost/test.html";
            webBrowser.Navigate(myurl);
        }

        private class Books
        {
            public string Title { get; set; }
            public List<string> Authors { get; set; }
            public int Edition { get; set; }
            public int Year { get; set; }
        }
    }
}

Also you can see output on screenshot: enter image description here

EDIT:

Also you can have a trouble with JSON.stringify method.

It can returns null.

In this case you can review SO topics: here and here.

If JSON.stringify method returns null then try to add next code to your HTML page:

<head>
<meta http-equiv='X-UA-Compatible' content='IE=edge' >
</head>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.