0

I am fairly new to html & JSON and am struggling a little with extracting the data I am after in a usable format within Python on a Raspberry Pi project.

I am using a device which outputs some live data over a wifi link in the format of a html page. Although the data shown on the page can be changed, I am only really concerned with getting data from a single page for now. When viewed in Notepad ++ the page looks like:

<!DOCTYPE html>
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><style>.b{position:absolute;top:0;bottom:0;left:0;right:0;height:100%;background-color:#000;height:auto !important;}.f{border-radius: 10px;font-weight:bold;position:absolute;top:50%;left:0;right:0;margin:auto;background:#024d27;padding:50px;box-sizing:border-box;color:#FF0;margin:30px;box-shadow:0px 2px 18px -4px #0F0;transform:translateY(-50%);}#V{font-size:96px;}#U{font-size: 56px;}#N{font-size: 36px;}</style></head><body><div class="b"><div class="f"><span id="N">Voltage</span><br><span id="V">12.53</span>&nbsp;<span id="U">V</span><br></div></div><script>reqData();setInterval(reqData, 200);function reqData() {var xhr = new XMLHttpRequest();xhr.onload = function() {if (this.status == 200) {var data = JSON.parse(xhr.responseText);document.getElementById('N').innerHTML = data.n;document.getElementById('V').innerHTML = data.v;document.getElementById('U').innerHTML = data.u;} else {document.getElementById('N').innerHTML = "?";document.getElementById('V').innerHTML =  "?";document.getElementById('U').innerHTML = "?";}};xhr.open('GET', 'readVal', true);xhr.send();}</script></body></html>

As you can see, it is a fairly simple page which just provides the information I am trying to extract, presented in a Green box with Yellow text on a black background.

From staring at the info a little, the information I am trying to extract is that associated with Span ID = 'V' (voltage), 'N' (name) and 'U' (units).

The data is displayed live on the webpage (i.e. updates every 200ms (i think) without refreshing the page) and I would like to extract the values as frequently as possible.

I have tried a few different blocks of code/methods and this seems to be the only one which I am currently able to gain any success with:

import urllib.request, json, html

data = urllib.request.urlopen("http://192.168.4.1").read()

print (data)

This returns me the html source code for the page correctly (albeit with a delay of about 5seconds which may just be related to the low spec of the Pi Zero i am running it on).

However, I dont seem able to extract the JSON data from within this. I have tried:

data_json = json.loads(data)

but this gives me a JSONDecodeError: expecting value: line 1 column 1 (char 0) which I am assuming is because the 'data' is a mix of HTML code and JSON still. I have also noticed that the actual variable information I am trying to retrieve (Voltage, 12.53 & V from the example source page at the top) are just shown as '?' placeholders when I open the page using urllib rather than loading the actual value shown on the page.

Is anyone able to offer me any pointers at all please?

Thanks in advance, Steve

1 Answer 1

1

As you've noticed from the error message and the raw HTML code, the result you're getting from your device isn't json data, it's html with javascript. It looks like the HTML you posted does an ajax request (a javascript GET request) to some local endpoint (/readVal perhaps?).

Try opening http://192.168.4.1 in your browser, open dev tools, and observe what network requests the page makes under the hood - specifically, look for some XHR requests. Look at the request URL and response - I bet you'll find some local endpoint that returns the raw json data you want.

Or just try http://192.168.4.1/readVal and see if that's it.

Sign up to request clarification or add additional context in comments.

3 Comments

That is great thankyou! You are quite correct, there is an XHR request being sent every 200ms to update the values shown on the html page. I have now modified my code so that it just accesses the /readVal page and have successfully managed to pull-out and manipulate the information which I needed.
I currently have my code running in in infinite 'while True:' loop which repeatedly requests the page be opened by the urllib and reads the data from it. Is this the best way to do this or should I be trying to keep the page open and refreshing the values by sending the xhr request from my code? Thanks
@Rallysteve Nice! RE the infinite loop: I don't know enough about your use case to (usefully) answer, but consider this: does it work? Does it do what you want? If yes: then keep doing it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.