0

I want to split the following string:

String line ="DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";

into following tokens:

DOB
1234567890
11
07/05/12
first,last
100
is,a,good,boy

I tried using following regular expression:

import java.util.*;
import java.lang.*;
import java.util.regex.*;
import org.apache.commons.lang.StringUtils;

class SplitString{

    public static final String quotes = "\".[[((a-z)|(A-Z))]+( ((a-z)|(A-Z)).,)*.((a-z)|(A-Z))].\"" ;
    public static final String ISSUE_UPLOAD_FILE_PATTERN = "((a-z)|(A-Z))+ [(((a-z)|(A-Z)).,)* + ("+quotes+".,) ].((a-z)|(A-Z)) + ("+quotes+")";

    public static void main(String[] args){

        String line ="DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";
        String delimiter = ",";

    Pattern p = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN);

    Pattern pattern = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN);
    String[] output = pattern.split(line);

    System.out.println(" pattern: "+pattern);

    for(String a:output){
        System.out.println(" output: "+a);
    }

    }             
}

Am I missing anything in the regular expression?

2
  • 2
    Looks like you're trying to parse a csv file. Why dont you use something like opencsv.sourceforge.net ? Commented Jul 10, 2012 at 8:23
  • I agree with krishnakumarp opencsv should handle quotes CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), ',', '"'); Commented Jul 10, 2012 at 8:25

2 Answers 2

1

This is an updated version of your code that gives you your expected output:

public static final String ISSUE_UPLOAD_FILE_PATTERN = "(?<=(^|,))(([^\",]+)|\"([^\"]*)\")(?=($|,))";
public static void main(String[] args) {
    String line = "DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";
    Matcher matcher = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN).matcher(line);
    while (matcher.find()) {
        if (matcher.group(3) != null) {
            System.out.println(matcher.group(3));
        } else {
            System.out.println(matcher.group(4));
        }
    }
}

The regex works like this: (?<=(^|,)): Check that the character before the match is start of string or a ,
(([^\",]+)|\"([^\"]*)\"): Match either "<any number of (not")>" or any number of (not" or ,)
(?=($|,)): Check that the character after the match is end of string or a ,
The result will be i either group 3 or 4 depending on which part matched.

Sign up to request clarification or add additional context in comments.

2 Comments

I want to modify the above regular expression to allow commas containing nothing to parse. eg. the following string will parse and give output as follows: 'String line = "DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\",55,,44,,,,700";' output: DOB 1234567890 11 07/05/12 first,last 100 is,a,good,boy 55 44 700
@Omkar: That example works just as you describe for me. What goes wrong when you try it?
0

Your regular expressions do some weird stuff with [ and ]: the use of these doesn't look at all like character ranges. For this reason, I didn't bother to decypher and fix all of your expression.

As a second note, you should make sure what your regular expressions should describe: do you want them to match the delimiter between tokens, or each individual non-delimiter token? Use of the split method implies the former, but I guess for your application, the latter is easier to achieve. In fact in a recent answer of mine I came up with a regular expression matching tokens of a csv file:

String tokenPattern = "\"[^\"]*(\"\"[^\"]*)*\"|[^,]*";

This will match

  • unquoted strings up to but not including the next comma
  • qutoed strings up to the closing quote, including embedded commas
  • quoted strings including double quotes

You can use this, create a matcher for your line, iterate over all matches using find and extract the token using group(). You could alkso use that loop to strip quotes and transform double quotes to single quotes, if you need the semantic value of the column.

As an alternative, you could of course also use a CSV reader as suggested in comments to your question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.