6

I have a string which contains a sentence and I want to split it in half, based on a word. I have the regex (\\w+) word which I thought would get me all the words before "word" + "word" itself, then I could just remove the last four chars.

However this doesn't seem to work.. any ideas what I've done wrong?

Thanks.

4
  • code is more helpful then describing the problem. Commented May 2, 2012 at 19:56
  • Maybe consider a non-greedy qualifier '+?' instead of '+' Commented May 2, 2012 at 19:57
  • 1
    "This doesn't seem to work," huh? What happens? What do you want to happen? Commented May 2, 2012 at 19:59
  • why not just use word? Using Pattern.find you can find its index in a string Commented May 2, 2012 at 20:00

5 Answers 5

10

This seems to work:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("([\\w\\s]+) word");
        Matcher m = p.matcher("Could you test a phrase with some word");
        while (m.find()) {
            System.err.println(m.group(1));
            System.err.println(m.group());
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

5

Using string manipulation:

int idx = sentence.indexOf(word);
if (idx < 0)
  throw new IllegalArgumentException("Word not found.");
String before = sentence.substring(0, idx);

Using regex:

Pattern p = Pattern.compile(Pattern.quote(word));
Matcher m = p.matcher(sentence);
if (!m.find())
  throw new IllegalArgumentException("Word not found.");
String before = sentence.substring(0, m.start());

Alternatively:

Pattern p = Pattern.compile("(.*?)" + Pattern.quote(word) + ".*");
Matcher m = p.matcher(sentence);
if (!m.matches())
  throw new IllegalArgumentException("Word not found.");
String before = m.group(1);

Comments

3

You will want to tokenize each part of the sentence before and after the word.

http://docs.oracle.com/javase/1.5.0/docs/api/

 String[] result = "this is a test".split("\\s"); //replace \\s with your word
 for (int x=0; x<result.length; x++)
     System.out.println(result[x]);

1 Comment

I could help expand on my example if need be, but a quick over view is that the sections of the sentence are stored in an array, and it's split by the word you have breaking the sentence up.
2

Try this:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("^.*?(?= word)");
        Matcher m = p.matcher("Everything before the word");
        while (m.find()) {
            System.out.println(m.group());
        }
    }
}

It breaks down as follows:

.*? everything

(?= before

word

) end

3 Comments

Oh yeah, silly attempt sums it up nicely :) I'm not being rude here, I'm telling a fact...
I don't see how code format is necessary since the question was regarding the regular expression itself, I would assume he already knows how to compile an expression. I gave the expression and broke it apart to show what each section is doing. I'll try to be more descriptive in the future, brand new to stack overflow.
Your edit is already a lot better, I've cleared the downvote. Have fun on SO!
0

The reason is that + is a greedy quantifier and will match the entire String including the word you specify, without giving back.

If you change it to (\\w+?) word it should work (reluctant quantifier). More on quantifiers and their exact function here.

3 Comments

+ is greedy, but it does allow backtracking. The possessive equivalent is ++
Okay, I've never really figured quantifiers out then. I think by backtracking you mean that you actually specify where and what in the regex? Whereas, reluctant will find 2 matches automatically, given that the input string contains the word he was looking for...
By backtracking I mean that the expression "\\w+\\w" will match "xy". The matcher will match "\\w+" against "xy", then realize that there is nothing left to match the second "\\w" against. So it will backtrack, matching "\\w+" against "x", and the second "\\w" against "y".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.