0

The JSON I wish to use is embedded on a HTML page. Within a tag on the page there is a statement:

<script>
jsonRAW = {... heaps of JSON... }

Is there a parser to extract this from HTML? I have looked at json.NET but it requires its JSON reasonably formatted.

4
  • 1
    Are you scraping a web page? Commented Sep 24, 2018 at 2:55
  • Html Agility Pack is an HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. Commented Sep 24, 2018 at 2:58
  • 1
    Yes I am scraping a web page. Well spotted. Commented Sep 24, 2018 at 2:59
  • The Html Agility Pack is an excellent tool and will be part of the solution. It will get the content of the <script> tag. Now to parse the javascript variable... Commented Sep 24, 2018 at 23:50

1 Answer 1

2

You can try to use HTML Agility pack. This can be downloaded as a Nuget Package. After installing, this is a tutorial on how to use HTML Agility pack. The link has more info but it works like this in code:

var urlLink = "http://www.google.com/jsonPage"; // 1. Specify url where the json is to read. 

var web = new HtmlWeb(); // Init the HTMl Web

var doc = web.Load (urlLink); // Load our url

if (doc.ParseErrors != null) { // Check for any errors and deal with it. 
}

doc.DocumentNode.SelectSingleNode(""); // Access the dom.

There are other things in between but this should get you started.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.