1

I am learning to use request and cheerio to parse a simple html file. However, in the page there is many script tag and inside them reside the actual data. For example like

<script> var data = {"name":"John","age":33} </script>

So naturally the thing that is interesting is the "data" variable. Is there a more natural way then doing regex to get that data?

2 Answers 2

2

With the new version jsdom(v16.4.0, nodejs 12.6.0), jsdom.jsdom doesnt exist, we can use new JSDOM like below:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<script> var foo = "bar" </script>`, { runScripts: "dangerously" });
console.log(dom.window.foo);  // output is:  bar
Sign up to request clarification or add additional context in comments.

Comments

0

I don't believe cheerio supports parsing inline scripts. However you can use jsdom for your use case

var jsdom = require('jsdom')
var html = '<script>var data = {"name":"John","age":33} </script>'

jsdom.defaultDocumentFeatures = {
  FetchExternalResources: ['script'],
  ProcessExternalResources: ['script'],
  MutationEvents: '2.0',
  QuerySelector: false
}

var document = jsdom.jsdom(html)
var window = document.createWindow()
console.dir(window.data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.