1

I'm trying to find a plaintext JSON within a webpage, using Javascript. The JSON will appear as plaintext as seen in the browser, but it is possible that it would be truncated into separate html tags. Example:

<div>
{"kty":"RSA","e":"AQAB","n":"mZT_XuM9Lwn0j7O_YNWN_f7S_J6sLxcQuWsRVBlAM3_5S5aD0yWGV78B-Gti2MrqWwuAhb_6SkBlOvEF8-UCHR_rgZhVR1qbrxvQLE_zpamGJbFU_c1Vm8hEAvMt9ZltEGFS22BHBW079ebWI3PoDdS-DJvjjtszFdnkIZpn4oav9fzz0
</div>
<div>
xIaaxp6-qQFjKXCboun5pto59eJnn-bJl1D3LloCw7rSEYQr1x5mxhIxAFVVsNGuE9fjk0ueTDcMUbFLPYn6PopDMuN0T1B2D1Y8ClItEVbVDFb-mRPz8THJ_gexJ8C20n8m-pBlpL4WyyPuY2ScDugmfG7UnBGrDmS5w"}
</div>

I've tried to use this RegEx.

{"?\w+"?:[^}<]+(?:(?:(?:<\/[^>]+>)[^}<]*(?:<[^>]+>)+)*[^}<]*)*}

But the problem is it fails to work with nested JSON.

I may also use javascript to count the number of { and } to find where the JSON actually ends, but there must be better options than using this slow and clumsy approach.

Many thanks


Update: Perhaps there ain't better way to do this. Below is my current code (a bit verbose but probably needed):

let regex = /{[\s\n]*"\w+"[\s\n]*:/g;

// Consider both open and close curly brackets
let brackets = /[{}]/g;

let arr0, arr;
// Try to parse every matching JSON
arr0 = match.exec(body);
if (arr0 === null) { // Nothing found
    return new Promise(resolve => resolve());
}

try {
    brackets.lastIndex = match.lastIndex; // After beginning of current JSON
    let count = 1;
    // Count for { and } to find the end of JSON.
    while ((count !== 0) && ((arr = brackets.exec(body)) !== null)) {
        count += (arr[0] === "{" ? 1 : -1);
    }

    // If nothing special, complete JSON found when count === 0;
    let lastIdx = brackets.lastIndex;
    let json = body.substring(match.lastIndex - arr0[0].length, lastIdx);

    try {
        let parsed = JSON.parse(json);
     // Process the JSON here to get the original message
    } catch (error) {
        console.log(err);
    }

...

} catch(err) {
    console.log(err);
};
2
  • 1
    A general solution without constraints would be hard. Maybe search for elements whose textContent starts with {, then evaluate it, followed by its next sibling if it doesn't parse, etc. Don't use a regular expression Commented Sep 21, 2019 at 7:55
  • @CertainPerformance Unfortunately in my case the JSON doesn't always appear at the beginning of an element, but luckily they all start with the same element (which I'd be searching for; the code above is generalized a bit). So for now I'd still go for counting brackets... Commented Sep 25, 2019 at 17:04

1 Answer 1

1

That's not possible in a good way, it might be possible to take a parent element's innerText and parse that:

console.log(JSON.parse(document.getElementById('outer').innerText.replace(/\s|\n/g, '')));
<div id="outer">
<div>
{"kty":"RSA","e":"AQAB","n":"mZT_XuM9Lwn0j7O_YNWN_f7S_J6sLxcQuWsRVBlAM3_5S5aD0yWGV78B-Gti2MrqWwuAhb_6SkBlOvEF8-UCHR_rgZhVR1qbrxvQLE_zpamGJbFU_c1Vm8hEAvMt9ZltEGFS22BHBW079ebWI3PoDdS-DJvjjtszFdnkIZpn4oav9fzz0
</div>
<div>
xIaaxp6-qQFjKXCboun5pto59eJnn-bJl1D3LloCw7rSEYQr1x5mxhIxAFVVsNGuE9fjk0ueTDcMUbFLPYn6PopDMuN0T1B2D1Y8ClItEVbVDFb-mRPz8THJ_gexJ8C20n8m-pBlpL4WyyPuY2ScDugmfG7UnBGrDmS5w"}
</div>
</div>

But it's likely to fail sometimes

Sign up to request clarification or add additional context in comments.

1 Comment

I obviously overlooked .innerText when I said I need to filter the html tag, but seems like I still need to count { and } after all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.