1

Apologies for my poor understanding on the regex world. I'm trying to split a text using regex. Here's what I'm doing right now. Please consider the following string


String input = "Name:\"John Adam\"  languge:\"english\"  Date:\" August 2011\"";
Pattern pattern = Pattern.compile(".*?\\:\\\".*?\\\"\\s*");
Matcher matcher = pattern.matcher(input);
List keyValues = new LinkedList();
while(matcher.find()){
   System.out.println(matcher.group());
   keyValues.add(matcher.group());
}
System.out.println(keyValues);

I get the right output, which is what I'm looking.


Name:"John Adam"  
languge:"english"  
Date:" August 2011"

Now, I'm struggling to make it a little generic. For e.g. if I add another pattern in the input string. I've added a new value Audience:(user) in a different pattern, i.e. " is replaced by ();


String input = "Name:\"John Adam\"  languge:\"english\"  Date:\" August 2011\"  Audience:(user)";

What'll be the generic pattern for this ? Sorry if this sounds too lame.

Thanks

3 Answers 3

2

Step 1: Remove most of those baskslashes - you don't need to escape quotes or colons (they are just another normal character)

Try this pattern:

".*?:[^\\w ].*?[^\\w ]\\s*"

It works for all non-word/space chars being a delimiter, works for your test case, and would work for name:'foo' etc

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for simplifying it, it does work for most of the cases, except if there's format like test:abc .
You would be better off changing the regex to capture the name and value each as a regex group: "(.*?):[^\\w ](.*?)[^\\w ]\\s*" then using matcher.group(1) and matcher.group(2) to access them directly
1

You can always use OR operator |

Pattern pattern = Pattern.compile("(.*?\\:\\\".*?\\\"\\s*)|(.*?\\:\\(.*?\\)\\s*)");

Comments

1

First of all I should point out that regular expressions are NOT a magic bullet. By that I mean that while they can be incredibly flexible and useful in some cases they don't solve all problems of text matching (for instance parsing XML-like markup)

However, in the example you gave, you could use the | syntax to specify an alternate pattern to match. An example might be:

Pattern pattern = Pattern.compile(".*?\\:(\\\".*?\\\"|\\(.*?\\))\\s*");

This section in parentheses: (\\\".*?\\\"|\\(.*?\\)) can be thought of as: find a pattern that matches \\\".*?\\\" or \\(.*?\\) (and remember what the backslashes mean - they are escape characters.

Note though that this approach, while flexible, requires you to add a specific case quite literally so it's not truly generic in the absolute sense.

NOTE

To better illustrate what I meant by not being able to craft a truly generic solution, here's a more generic pattern that you could use:

Pattern pattern = Pattern.compile(".*?\\:[\\\"(]{1,2}.*?[\\\")]{1,2}\\s*");

The pattern above uses character classes and it's more generic but while it will match your examples, it will also match things like: blah:\stuff\ or blah:"stuff" or even hybrids like blah:\"stuff) or worse blah:((stuff""

2 Comments

Thanks for your response. I completely understand you viewpoint, though it fits this specific scenario, its not truly generic.
Yep @Shamik - you could make it a little more generic but that would have to be at the expense of potentially matching something you wouldn't want to match (I'll update my answer with an example)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.