0

I don't have mach experience with html so I hope to use the right terminology to explain myself.

I have the following html line ..

   <script type="text/javascript">
             var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name":    "C0nw0nk Steam Patcher.exe","first_seen":
       "2355-02-21 00:00:00,183", "calls": [{"category": "system",
       "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
       {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
       "api": "ExitProcess"}]}];
  </script>

this node is nested within few nodes with the following pattern:

<div class="tab-content">

How can i inject graph_raw_data into python variable - something slimier to dictionary varibale type.

Basically I need to iterate thorough all the nodes and find the desire one ? how can I do it in python ?

I take the html data with this python code:

f = urllib2.urlopen(url)
page_data = f.read()
soup = BeautifulSoup(page_data)
2
  • 1
    HTML is just markup, it does not have any variables :-( Commented Feb 22, 2015 at 12:49
  • graph_raw_data is a JavaScript object, nicely decorated as JSON. Use Python to parse the JSON. How that works, I don't know, but JSON is there to allow us to send data between languages. Commented Feb 22, 2015 at 12:53

1 Answer 1

1

Use regex to extract the string which contains the variable, then use json.loads to convert it into python variable.

import json
import re

html="""<script type="text/javascript">
             var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name":    "C0nw0nk Steam Patcher.exe","first_seen":
       "2355-02-21 00:00:00,183", "calls": [{"category": "system",
       "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
       {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
       "api": "ExitProcess"}]}];
  </script>"""

graph_raw_data=re.search(r'var graph_raw_data = (.*?);',html.replace('\n','')).group(1)
data=json.loads(graph_raw_data)
print(data)
>>>[{'parent_id': 844, 'calls': [{'timestamp': '2355-02-21 00:00:00,193', 'category': 'system', 'api': 'LdrGetDllHandle'}, {'timestamp': '2015-02-21 18:59:49,584', 'category': 'process', 'api': 'ExitProcess'}], 'process_name': 'C0nw0nk Steam Patcher.exe', 'first_seen': '2355-02-21 00:00:00,183', 'process_id': 236}]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.