I have a project using the python screen-scraping framework scrapy. I created a spider that loads all <script> tags and processes the second one. This is because within the test data I gathered, the data I need, was in the second <script> tag.
But now I have a problem, whereas some pages contain the data I want in some other script tags (#3 or #4). Further obstacle is that mostly the second line of the second javascript tag has the JSON I want. But depending on the page, this could also be the 3rd or the 4th line.
Consider this simple HTML file:
<html>
<head>
<title> Test </title>
</head>
<body>
<p>
This is a text
</p>
<script type="text/javascript">
var myJSON = {
a: "a",
b: 42
}
</script>
</body>
</html>
I can access myJSON.b and get 42 if I open this page in my browser (firefox) and go to the developer tools and console.log(myJSON.b)
So my Question is: How can I extract JavaScript variable or JSON from a scrapy-fetched-page?