0

I have a csv file which contain this type of document:

{""cast_id"": 10, ""character"": ""Mushu (voice)"", ""credit_id"": ""52fe43a09251416c75017cbb"", ""gender"": 2, ""id"": 776, ""name"": ""Eddie Murphy"", ""order"": 0}, {""cast_id"": 62, ""character"": ""[Singing voice]"", ""credit_id"": ""597a65c8925141233d0000bb"", ""gender"": 2, ""id"": 18897, ""name"": ""Jackie Chan"", ""order"": 1}, {""cast_id"": 16, ""character"": ""Mulan (voice)"", ""credit_id"": ""52fe43a09251416c75017cd5"", ""gender"": 1, ""id"": 21702, ""name"": ""Ming-Na Wen"", ""order"": 2}

I used this regular expression first to change quadruple quote to double quote:

String newResult = result.replaceAll("\"{2}", "\"");

Then I use this regular expression to split this string:

String[] jsonResult = newResult.split(", (?![^{]*\\})");

However, it seperates the string into this:

{"cast_id": 10, "character": "Mushu (voice)", "credit_id": "52fe43a09251416c75017cbb", "gender": 2, "id": 776, "name": "Eddie Murphy", "order": 0}

{"cast_id": 62

"character": "[Singing voice

something else then

{"cast_id": 16, "character": "Mulan (voice)", "credit_id": "52fe43a09251416c75017cd5", "gender": 1, "id": 21702, "name": "Ming-Na Wen", "order": 2}

So my regular expression failed when it meets square brackets [], can I have some help with this?

I tried to use http://www.regexplanet.com/advanced/java/index.html but I don't understand what I should put in option, replacement and input. How do I use this website?

Thanks

6
  • 3
    I would recommend you use a JSON parser instead of regular expressions. It will save you a lot of headaches Commented Nov 30, 2017 at 8:53
  • I tried Json-simple, But json-simple only takes in standard json type. That's why I am changing the string to individual standard json string, then I would parse it. Commented Nov 30, 2017 at 8:55
  • is there a method in json-simple or other package which split a string of multiple json input separated by comma? I couldn't find it in json-simple Commented Nov 30, 2017 at 8:57
  • What do you mean by "standard json type" Commented Nov 30, 2017 at 8:57
  • Like the ones in his examples. mkyong.com/java/json-simple-example-read-and-write-json My data is a string of several json separated by comma. that's why I need to separate them and then parse them Commented Nov 30, 2017 at 9:00

3 Answers 3

1

You are dealing with JSON data which has been saved as one column CSV file. :) Quotes will be escaped with double quotes in CSV, so you could just use a CSV library to read your file. As I said, you should expect to get just one column - one value containing your JSON. Then you use a JSON library to parse your JSON.

=> you would not need to implement any parsing at all.

Sign up to request clarification or add additional context in comments.

4 Comments

what do you mean by "Quotes will be escaped with double quotes in CSV" ? Also is there a method which automatically read csv file or string? Thanks
Its part of the CSV specification (creativyst.com/Doc/Articles/CSV/CSV01.htm) Text data is usually enclosed in quotes and if you want to use quotes within your text (like your JSON test), you write a double quote. If I were you, I would try to read the proper JSON first with something like this (commons.apache.org/proper/commons-csv) and after that take care of parsing the JSON. Step 1 Google: "Java read CSV library" Step 2 Google: "Java parse JSON"
I think I see what you mean. Could you tell you how to install common csv library on eclipse? I see it's not jar file.
0

You should be looking for the pattern }, { The regex: (?<=\}), (?=\{) does just that. Your regex will give a false positive if a } is missing at the end of the string.

(Tested with https://regex101.com/)

After that you can parse each string as JSON, use a library for that.

Comments

0

As others recommended, a parser would be a better solution than splitting yourself. Regular expressions run into limitations when you get nested brackets, for example. I used Google's Gson library, and tweaking your input slightly produced the desired split. The important step was to turn your input into a JSON array, otherwise the parser would fail after the first element:

// Pre-processed your input to remove the double double quotes
String input = "{'cast_id': 10, 'character': 'Mushu (voice)', 'credit_id': '52fe43a09251416c75017cbb', 'gender': 2, 'id': 776, 'name': 'Eddie Murphy', 'order': 0}, {'cast_id': 62, 'character': '[Singing voice]', 'credit_id': '597a65c8925141233d0000bb', 'gender': 2, 'id': 18897, 'name': 'Jackie Chan', 'order': 1}, {'cast_id': 16, 'character': 'Mulan (voice)', 'credit_id': '52fe43a09251416c75017cd5', 'gender': 1, 'id': 21702, 'name': 'Ming-Na Wen', 'order': 2}";

JsonArray array = new JsonParser().parse("[" + input + "]").getAsJsonArray();
for (int i = 0; i < array.size(); i++)
{
    System.out.println(array.get(i));
}

Output:

{"cast_id":10,"character":"Mushu (voice)","credit_id":"52fe43a09251416c75017cbb","gender":2,"id":776,"name":"Eddie Murphy","order":0}
{"cast_id":62,"character":"[Singing voice]","credit_id":"597a65c8925141233d0000bb","gender":2,"id":18897,"name":"Jackie Chan","order":1}
{"cast_id":16,"character":"Mulan (voice)","credit_id":"52fe43a09251416c75017cd5","gender":1,"id":21702,"name":"Ming-Na Wen","order":2}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.