Extract JSON string from unstructured string

Question

I have an unstructured String and I would like to extract the following JSON string with the "restaurant" tag from there using the regex. The data is for the example but the format and the "restaurant" tag is correct.

{
    "restaurant": {
        "id": "abcd-efgh-ijkl",
        "created_at": "2020-12-31",
        "cashier_payments": []
    }
 }

I come up with the regex String findMe = "\"restaurant\": {(\\n.*?)+}";, however, its taking all the data till the last }.

How do I correct the regex?

As asked, I get the unstructured String using the Jsoup:

        String htmlString = contentBuilder.toString();
        Document doc = Jsoup.parse(htmlString);
        Elements elements = doc.getElementsByTag("script");
    
        for (Element element :elements ){
            
            for (DataNode node : element.dataNodes()) {
                String s = node.getWholeData();
                if(s.contains("\"restaurant\":")){
                    System.out.println(s);
                }
            }
            System.out.println("-------------------");
        }

So I would like to parse from the String s.

the . in your regex matches any character. Is there a character you could exclude to get the result you want? Have you looked at greedy vs non-greedy matching? — tgdavies
– tgdavies, Commented Jul 10, 2020 at 8:58
No I need everything inside the pattern mentioned as String. So up above the "restaurant" tag "{" till the closing "}". I am trying to learn regex last 2 hours but this is not working. — Arefe
– Arefe, Commented Jul 10, 2020 at 8:59
Can you show an example of an "unstructured String"? The text in the grey box is well-structured JSON, so that can't be what you refer to as "unstructured". — Andreas
– Andreas, Commented Jul 10, 2020 at 9:01
The example String is inside a large HTML string which I mean unstructured. It may not be the correct wording though. I updated the question. — Arefe
– Arefe, Commented Jul 10, 2020 at 9:02
You could try regex "\"restaurant\": \\{[^}]*\\}", which would work in your example, but it's still a bad regex because it cannot handle nested objects or end-brace characters inside the string values. Regex is the wrong tool for the job. Since the data is well-structured JSON, use a JSON parser instead. — Andreas
– Andreas, Commented Jul 10, 2020 at 9:07

Andrew Vershinin · Accepted Answer · 2020-07-10 09:21:10Z

1

If the entries you're intending to extract do not contain objects (otherwise, you'll need a proper JSON parser), you can use the following regex: "restaurant":\s*\{[^}]*\}
Edit: It seems like the value object does indeed contain other objects, so I'll suggest using a JSON library, like Jackson.

edited Jul 10, 2020 at 9:21

answered Jul 10, 2020 at 9:06

Andrew Vershinin

1,97813 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arefe Over a year ago

Not sure how to thank you. Each time I have a regex encounter, I feel very uneasy. Thank you so much. will accept your answer shortly.

Andrew Vershinin Over a year ago

@ChakladerAsfakArefe No problem! But before using a regex think about using a full featured JSON parser, because this code will break if one day you'll get an object inside the object you're trying to extract, and otherwise it'll be a cleaner and more flexible solution.

Arefe Over a year ago

Okay I have an issue. I have "pos_users": [{},{}] data inside the curly braces of the "restaurant": { } and your regex ends to the "pos_users": [{} and doesnt complete the whole "restaurant": { } data. So I have to say this is not working as intended. Sorry

Andrew Vershinin Over a year ago

@ChakladerAsfakArefe As I and other commenters mentioned, if you have a closing brace anywhere inside, "restaurant": { /* here */ }, this approach will fail. If your data is anymore complex than in the OP, you need a JSON parser, for example the one provided by the Jackson library.

Collectives™ on Stack Overflow

Extract JSON string from unstructured string

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related