1

I am trying to extract text between semi colon (;) and WORD. i am using below code but unable to extract "TVS A3003" using below code.

Matcher matcher = Pattern.compile("(?<=;).*?(?=WORD)").matcher(string);

Three Sample strings :

1. (XYZTRR: KTTT 4.0.1; TVS A3003 WORD/LLLLL ; pj ;) 

2. (XcdcdRR; dTff 5.4.1; TVS A3003 WORD/UJH;KKKHH fpp) 

3. LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp

4. (;LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp ;)

I want to extract TVS A3003 in all the cases.

7
  • 1
    Post the full relevant code. What exactly does not work? Commented Oct 12, 2017 at 11:45
  • find the answer for the exact question below Commented Oct 12, 2017 at 11:53
  • @WiktorStribiżew Hi , the link shared by you is failing for the 3rd sample and giving output as "776332 8.7.1; TVS A3003 " Commented Oct 12, 2017 at 12:45
  • You may solve it with (?<=;)[^;]*?(?=WORD) or ;([^;]*?)WORD Commented Oct 12, 2017 at 13:00
  • @WiktorStribiżew Thanks a lot Wiktor. the above regex is working for all the three sample. Can you suggest a regex for 4th sample string as well? Commented Oct 12, 2017 at 13:15

2 Answers 2

1

You need to find a ; and then match any 0+ chars other than ; as few as possible up to the first occurrence of WORD. You may do that using

;([^;]*?)WORD

See the regex demo. Note that the leading/trailing whitespace can be easily trimmed off with .trim() after a match is found.

See the Java demo below:

List<String> strs = Arrays.asList("(XYZTRR: KTTT 4.0.1; TVS A3003 WORD/LLLLL ; pj ;)", 
        "(XcdcdRR: dTff 5.4.1; TVS A3003 WORD/UJHKKKHH fpp)",
        "(LLLhf22; 776332 8.7.1; TVS A3003 WORD/UHHGFVV phhp) );");
Pattern pattern = Pattern.compile(";([^;]*?)WORD");
while (String s : strs) {
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()){
        System.out.println(matcher.group(1).trim()); 
    } 
}

Output:

TVS A3003
TVS A3003
TVS A3003
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot Wiktor. the above regex is working for all the three sample. Can you suggest a regex for 4th sample string as well?
@MohitAgrawal There is no difference between the 4th and the other 3 strings, see the Java demo.
0

reg ex is (?<=KTTT 4\.0\.1; )(.*)(?= WORD/U)

Matcher matcher = Pattern.compile("(?<=KTTT 4\\.0\\.1; )(.*)(?= WORD/U)").matcher(string);

if(matcher.find()){
     System.out.println(matcher.group());
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.