I am fairly new to html & JSON and am struggling a little with extracting the data I am after in a usable format within Python on a Raspberry Pi project.
I am using a device which outputs some live data over a wifi link in the format of a html page. Although the data shown on the page can be changed, I am only really concerned with getting data from a single page for now. When viewed in Notepad ++ the page looks like:
<!DOCTYPE html>
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><style>.b{position:absolute;top:0;bottom:0;left:0;right:0;height:100%;background-color:#000;height:auto !important;}.f{border-radius: 10px;font-weight:bold;position:absolute;top:50%;left:0;right:0;margin:auto;background:#024d27;padding:50px;box-sizing:border-box;color:#FF0;margin:30px;box-shadow:0px 2px 18px -4px #0F0;transform:translateY(-50%);}#V{font-size:96px;}#U{font-size: 56px;}#N{font-size: 36px;}</style></head><body><div class="b"><div class="f"><span id="N">Voltage</span><br><span id="V">12.53</span> <span id="U">V</span><br></div></div><script>reqData();setInterval(reqData, 200);function reqData() {var xhr = new XMLHttpRequest();xhr.onload = function() {if (this.status == 200) {var data = JSON.parse(xhr.responseText);document.getElementById('N').innerHTML = data.n;document.getElementById('V').innerHTML = data.v;document.getElementById('U').innerHTML = data.u;} else {document.getElementById('N').innerHTML = "?";document.getElementById('V').innerHTML = "?";document.getElementById('U').innerHTML = "?";}};xhr.open('GET', 'readVal', true);xhr.send();}</script></body></html>
As you can see, it is a fairly simple page which just provides the information I am trying to extract, presented in a Green box with Yellow text on a black background.
From staring at the info a little, the information I am trying to extract is that associated with Span ID = 'V' (voltage), 'N' (name) and 'U' (units).
The data is displayed live on the webpage (i.e. updates every 200ms (i think) without refreshing the page) and I would like to extract the values as frequently as possible.
I have tried a few different blocks of code/methods and this seems to be the only one which I am currently able to gain any success with:
import urllib.request, json, html
data = urllib.request.urlopen("http://192.168.4.1").read()
print (data)
This returns me the html source code for the page correctly (albeit with a delay of about 5seconds which may just be related to the low spec of the Pi Zero i am running it on).
However, I dont seem able to extract the JSON data from within this. I have tried:
data_json = json.loads(data)
but this gives me a JSONDecodeError: expecting value: line 1 column 1 (char 0) which I am assuming is because the 'data' is a mix of HTML code and JSON still. I have also noticed that the actual variable information I am trying to retrieve (Voltage, 12.53 & V from the example source page at the top) are just shown as '?' placeholders when I open the page using urllib rather than loading the actual value shown on the page.
Is anyone able to offer me any pointers at all please?
Thanks in advance, Steve