0

I have an unstructured String and I would like to extract the following JSON string with the "restaurant" tag from there using the regex. The data is for the example but the format and the "restaurant" tag is correct.

{
    "restaurant": {
        "id": "abcd-efgh-ijkl",
        "created_at": "2020-12-31",
        "cashier_payments": []
    }
 }

I come up with the regex String findMe = "\"restaurant\": {(\\n.*?)+}";, however, its taking all the data till the last }.

How do I correct the regex?

As asked, I get the unstructured String using the Jsoup:

        String htmlString = contentBuilder.toString();
        Document doc = Jsoup.parse(htmlString);
        Elements elements = doc.getElementsByTag("script");
    
        for (Element element :elements ){
            
            for (DataNode node : element.dataNodes()) {
                String s = node.getWholeData();
                if(s.contains("\"restaurant\":")){
                    System.out.println(s);
                }
            }
            System.out.println("-------------------");
        }

So I would like to parse from the String s.

5
  • 1
    the . in your regex matches any character. Is there a character you could exclude to get the result you want? Have you looked at greedy vs non-greedy matching? Commented Jul 10, 2020 at 8:58
  • No I need everything inside the pattern mentioned as String. So up above the "restaurant" tag "{" till the closing "}". I am trying to learn regex last 2 hours but this is not working. Commented Jul 10, 2020 at 8:59
  • Can you show an example of an "unstructured String"? The text in the grey box is well-structured JSON, so that can't be what you refer to as "unstructured". Commented Jul 10, 2020 at 9:01
  • The example String is inside a large HTML string which I mean unstructured. It may not be the correct wording though. I updated the question. Commented Jul 10, 2020 at 9:02
  • You could try regex "\"restaurant\": \\{[^}]*\\}", which would work in your example, but it's still a bad regex because it cannot handle nested objects or end-brace characters inside the string values. Regex is the wrong tool for the job. Since the data is well-structured JSON, use a JSON parser instead. Commented Jul 10, 2020 at 9:07

1 Answer 1

1

If the entries you're intending to extract do not contain objects (otherwise, you'll need a proper JSON parser), you can use the following regex: "restaurant":\s*\{[^}]*\}
Edit: It seems like the value object does indeed contain other objects, so I'll suggest using a JSON library, like Jackson.

Sign up to request clarification or add additional context in comments.

4 Comments

Not sure how to thank you. Each time I have a regex encounter, I feel very uneasy. Thank you so much. will accept your answer shortly.
@ChakladerAsfakArefe No problem! But before using a regex think about using a full featured JSON parser, because this code will break if one day you'll get an object inside the object you're trying to extract, and otherwise it'll be a cleaner and more flexible solution.
Okay I have an issue. I have "pos_users": [{},{}] data inside the curly braces of the "restaurant": { } and your regex ends to the "pos_users": [{} and doesnt complete the whole "restaurant": { } data. So I have to say this is not working as intended. Sorry
@ChakladerAsfakArefe As I and other commenters mentioned, if you have a closing brace anywhere inside, "restaurant": { /* here */ }, this approach will fail. If your data is anymore complex than in the OP, you need a JSON parser, for example the one provided by the Jackson library.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.