2

So I have a web scraping project where one of the pages has all the necessary content in JSON format inside a set of <script> tags.

here's an example of said <script> tags:

<script>
  window.postData = {}
  window.postData["content"] = [json content]
</script>

I've used the HtmlAgilityPack to get to the particular <script> tags, but I am not sure how to grab just the json content from this. I can parse the JSON with JSON.net or other library/framework, so I'm not worried about that part. I'm just stuck on getting just the Json. Is there a javascript parsing library or something that I can use to get this, or is there another way to accomplish this.

Any help would be greatly appreciated!

1 Answer 1

3

Check out jint

var postDataJSON = new Engine()
    .Execute("window.postData = {}; window.postData['content'] = [json content]")
    .GetValue("window.postData");
Sign up to request clarification or add additional context in comments.

7 Comments

My guess is that this would crash since window will be undefined. Prepending window = {}; to the string should solve that though. Jint is a cool project but it's only a js-interpreter, it won't get far with scripts written for a browser that relies on the window-object or the DOM. Personally I would probably have solved this with a regex instead.
@kavun is this expecting me to know what the [json content] is supposed to be? what if I don't know what the content is? and only know that there is content there? I guess I could just dump the HtmlNode text content for the <script> node as string nodeText = node.InnerText; and use it as .Execute(nodeText).GetValue("window.postData"); ???
@Karl-JohanSjögren I'm not opposed to using regex, but I'm not that good with it.
@kavun Thank you very much. I'm going to give that a try and see how it goes. Thanks!
@kavun if I had a script that had: window.postData["content"] = [json content] window.postData["content-2"] = [json content] could I look at only one of the items or would I be stuck with looking at the two combined together? or would I be needing to post another question for that?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.