1

I have this String:

 String string="NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";

How can I do to split it into an array every 4 commas? I would like something like this:

     String[] a=string.split("d{4}");
     a[0]="NNP,PERSON,true,?";
     a[1]="IN,O,false,pobj";
     a[2]="NNP,ORGANIZATION,true,?";
     a[3]="p";
2
  • You could either use regex or split using "," and then put arrays back together Commented Apr 13, 2014 at 15:32
  • 1
    Regexes are fancy, but can turn out to be cryptic so make sure you document what it does, because it's a pain to figure out what it does, especially if you or even someone else looks at the code after some time. Also, processing (complex) regex will probably take up more time than splitting and grouping it back together, like @BrendanRius recommended. Commented Apr 13, 2014 at 15:53

4 Answers 4

2

Keep it simple. No need to use regex. Simply count the number of commas. when four commas are found then use String.substring() to find out the value.

Finally store the printed values in ArrayList<String>.

    String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";

    int count = 0;
    int beginIndex = 0;
    int endIndex = 0;
    for (char ch : string.toCharArray()) {
        if (ch == ',') {
            count++;
        }
        if (count == 4) {
            System.out.println(string.substring(beginIndex + 1, endIndex));
            beginIndex = endIndex;
            count = 0;
        }
        endIndex++;
    }

    if (beginIndex < endIndex) {
        System.out.println(string.substring(beginIndex + 1, endIndex));
    }

output:

    NP,PERSON,true,?
    IN,O,false,pobj
    NNP,ORGANIZATION,true,?
    p
Sign up to request clarification or add additional context in comments.

Comments

1

If you really have to use split you can use something like

String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");

Explanation if idea in my previous answer on similar but simpler topic

Demo:

String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
for (String s : array)
    System.out.println(s);

output:

NNP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p

But if there is any chance that you don't have to use split but you still want to use regex then I encourage you to use Pattern and Matcher classes to create simple regex which can find parts you are interested in, not complicated regex to find parts you want to get rid of. I mean something like

  1. any xx,xxx,xxx,xxx part where x is not ,
  2. any xx or xx,xx or xxx,xxx,xxx parts if they are placed at the end of string (to catch rest of data unmatched by regex from point 1.)

So

Pattern p = Pattern.compile("[^,]+(,[^,]+){3}|[^,]+(,[^,]+){0,2}$");

should do the trick.


Another solution and probably the fastest (and quite easy to write) would be creating your own parser which will iterate over all characters from your string, store them in some buffer, calculate how many , already occurred and if number is multiplication of 4 clear buffer and write its contend to array (or better dynamic collection like list). Such parser can look like

public static List<String> parse(String s){
    List<String> tokens = new ArrayList<>();
    StringBuilder sb = new StringBuilder();
    int commaCounter = 0;

    for (char ch: s.toCharArray()){
        if (ch==',' && ++commaCounter == 4){
            tokens.add(sb.toString());
            sb.delete(0, sb.length());
            commaCounter = 0;
        }else{
            sb.append(ch);
        }
    }
    if (sb.length()>0)
        tokens.add(sb.toString());

    return tokens;
}

You can later convert List to array if you need but I would stay with List.

Comments

0
StringTokenizer tizer = new StringTokenizer (string,",");
int count = tizer.countTokens ()/4;
int overFlowCount = tizer.countTokens % 4;
String [] a;
if(overflowCount > 0)
    a = new String[count +1];
else
    a = new String[count];
int x = 0;
for (; x <count; x++){
    a[x]= tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken();
}
if(overflowCount > 0)
while(tizer.hasMoreTokens()){
    a[x+1] = a[x+1] + tizer.nextToken() + ",";
}

Comments

0

Edited, Try this:

String str = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] arr = str.split(",");
ArrayList<String> result = new ArrayList<String>();
String s = arr[0] + ",";
int len = arr.length - (arr.length /4) * 4;
int i;
for (i = 1; i <= arr.length-len; i++) {
    if (i%4 == 0) {
        result.add(s.substring(0, s.length()-1));
        s = arr[i] + ",";
    }
    else
        s += arr[i] + ",";
}
s = "";
while (i <= arr.length-1) {
    s += arr[i] + ",";
    i++;
}
s += arr[arr.length-1];
result.add(s);

output:

    NP,PERSON,true,?
    IN,O,false,pobj
    NNP,ORGANIZATION,true,?
    p

2 Comments

Also, w is a word character. They also have other characters.
@SotiriosDelimanolis> Yes, I just answered with the example the OP provided, I'll update my answer now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.