2

How do I parse JavaScript code within HTML source with Python, for example I want to extract the productList object

here is my source below;

<html>
<body>
<div id="content-wrapper" class="row-fluid clearfix" role="contentinfo">
<!-- html content -->
</div>


   <script>
    var productList = { "daaa" : "ddddd"};
   </script>

</body>
</html>
2

2 Answers 2

1

I suggest you take a look at the BeautifulSoup - it can help you extract JavaScript code from an HTML file (but not parse/run it):

source = """<html>...</html>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(source)
js_code = soup.find_all("script")[0].text

Then you can use some JavaScript interpreter to run the code and get the variables - there are some out there like this one or this one. Just Google it.

Sign up to request clarification or add additional context in comments.

2 Comments

what do you think of using regexp instead to parse the extracted JavaScript?
@Parker, I am not sure if that's a good idea, never tried to parse any proramming language with regex myself thought. I guess you could try. Btw, you could try to use pyparsing: it allows you to create your own parsers to parse different languages
-1

I think you need to add the fuction so the computer can read if it is javascript and python, use this:

script type="text/javascript">  <!-------or python----></script>

1 Comment

Hello Ben Riley, Welcome to Stack Overflow! This is not a complete answer; please go back and edit to fully answer the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.