7

This sample data is returned by Web Service

200,6, "California, USA"

I want to split them using split(",") and tried to see the result using simple code.

String loc = "200,6,\"California, USA\"";       
String[] s = loc.split(",");

for(String f : s)
   System.out.println(f);

Unfortunately this is the result

200
6
"California
 USA"

The expected result should be

200
6
"California, USA"

I tried different regular expressions and no luck. Is it possible to escape the given regular expression inside of "" ?

UPDATE 1: Added C# Code

UPDATE 2: Removed C# Code

4
  • Do you expect to see more than one quoted item on the same line? Commented Feb 4, 2013 at 3:47
  • Hmmm. Only sentence/words inside of " " Commented Feb 4, 2013 at 3:49
  • possible duplicate of Parsing CSV input with a RegEx in java Commented Feb 4, 2013 at 3:53
  • In C#, you should be using string.Split, not Regex.Split. In any case, your desired result can't be achieved with the split function (in either language) - reading the documentation for those functions, you won't see any indication that they respect quotation marks or other textual conventions. Commented Feb 4, 2013 at 4:03

4 Answers 4

3
,(?=(?:[^"]|"[^"]*")*$)

This is the regex you want (To put it in the split function you'll need to escape the quotes in the string)

Explanation

You need to find all ','s not in quotes.. That is you need lookahead (http://www.regular-expressions.info/lookaround.html) to see whether your current matching comma is within quotes or out.

To do that we use lookahead to basically ensure the current matching ',' is followed by an EVEN number of '"' characters (meaning that it lies outside quotes)

So (?:[^"]|"[^"]*")*$ means match only when there are non quote characters till the end OR a pair of quotes with anything in between them

(?=(?:[^"]|"[^"]*")*$) will lookahead for the above match

,(?=(?:[^"]|"[^"]*")*$) and finally this will match all ',' with the above lookahead

Sign up to request clarification or add additional context in comments.

2 Comments

Even number of quotes ahead does not necessarily mean ""outside of quotes"" (assuming quotes can be nested like brackets). as an example, see the previous sentence.
You would allow "sdfdsf"sdfsdf"sdfsdf"sdfsdf"sdf" as a token, but is it even valid CSV?
2

An easier solution might be to use an existing library, such as OpenCSV to parse your data. This can be accomplished in two lines using this library:

CSVParser parser = new CSVParser();
String [] data = parser.parseLine(inputLine);

This will become especially important if you have more complex CSV values coming back in the future (multiline values, or values with escaped quotes inside an element, etc). If you don't want to add the dependency, you could always use their code as a reference (though it is not based on RegEx)

Comments

0

If there's a good lexer/parser library for Java, you could define a lexer like the following pseudo-lexer code:

Delimiter: ,
Item: ([^,"]+) | ("[^,"]+")
Data: Item Delimiter Data | Item 

How lexers work is that it starts at the top level token definition (in this case Data) and attempts to form tokens out of the string until it cannot or until the string is all gone. So in the case of your string the following would happen:

  • I want to make Data out of 200,6, "California, USA".
  • I can make Data out of an Item, a Delimiter and Data.
  • I looked - 200 is an Item and then , is a Delimiter so I can tokenize that and keep going.
  • I want to make data out of 6, "California, USA"
  • I can make Data out of an Item, a Delimiter and Data.
  • I looked - 6 is an Item and then , is a Delimiter so I can tokenize that and keep going.
  • I want to make data out of "California, USA"
  • I can make Data out of an Item, a Delimiter and Data.
  • I looked - "California, USA" is an Item, but I see no Delimiter after it, so let's try something else.
  • I can make Data out of an Item.
  • I looked - "California, USA" is an item, so I can tokenize that and keep going.
  • The string is empty. I'm done. Here's your tokens.

(I learned about how lexers work from the guide to PLY, a Python lexer/parser: http://www.dabeaz.com/ply/ply.html )

Comments

0

Hello Try this Expression.

public class Test {

    /**
     * @param args
     */
    public static void main(String[] args) {
        String loc = "200,6,\"Paris, France\"";  
        String[] str1 =loc.split(",(?=(?:[^\"]|\"[^\"]*\")*$)");

        for(String tmp : str1 ){
            System.out.println(tmp);
        }

    }

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.