I'm trying to find a plaintext JSON within a webpage, using Javascript. The JSON will appear as plaintext as seen in the browser, but it is possible that it would be truncated into separate html tags. Example:
<div>
{"kty":"RSA","e":"AQAB","n":"mZT_XuM9Lwn0j7O_YNWN_f7S_J6sLxcQuWsRVBlAM3_5S5aD0yWGV78B-Gti2MrqWwuAhb_6SkBlOvEF8-UCHR_rgZhVR1qbrxvQLE_zpamGJbFU_c1Vm8hEAvMt9ZltEGFS22BHBW079ebWI3PoDdS-DJvjjtszFdnkIZpn4oav9fzz0
</div>
<div>
xIaaxp6-qQFjKXCboun5pto59eJnn-bJl1D3LloCw7rSEYQr1x5mxhIxAFVVsNGuE9fjk0ueTDcMUbFLPYn6PopDMuN0T1B2D1Y8ClItEVbVDFb-mRPz8THJ_gexJ8C20n8m-pBlpL4WyyPuY2ScDugmfG7UnBGrDmS5w"}
</div>
I've tried to use this RegEx.
{"?\w+"?:[^}<]+(?:(?:(?:<\/[^>]+>)[^}<]*(?:<[^>]+>)+)*[^}<]*)*}
But the problem is it fails to work with nested JSON.
I may also use javascript to count the number of { and } to find where the JSON actually ends, but there must be better options than using this slow and clumsy approach.
Many thanks
Update: Perhaps there ain't better way to do this. Below is my current code (a bit verbose but probably needed):
let regex = /{[\s\n]*"\w+"[\s\n]*:/g;
// Consider both open and close curly brackets
let brackets = /[{}]/g;
let arr0, arr;
// Try to parse every matching JSON
arr0 = match.exec(body);
if (arr0 === null) { // Nothing found
return new Promise(resolve => resolve());
}
try {
brackets.lastIndex = match.lastIndex; // After beginning of current JSON
let count = 1;
// Count for { and } to find the end of JSON.
while ((count !== 0) && ((arr = brackets.exec(body)) !== null)) {
count += (arr[0] === "{" ? 1 : -1);
}
// If nothing special, complete JSON found when count === 0;
let lastIdx = brackets.lastIndex;
let json = body.substring(match.lastIndex - arr0[0].length, lastIdx);
try {
let parsed = JSON.parse(json);
// Process the JSON here to get the original message
} catch (error) {
console.log(err);
}
...
} catch(err) {
console.log(err);
};
textContentstarts with{, then evaluate it, followed by its next sibling if it doesn't parse, etc. Don't use a regular expression