2

I have the following String and I want to filter the MBRB1045T4G out with a regular expression in Java. How would I achieve that?

String:

<p class="ref">
<b>Mfr Part#:</b>
MBRB1045T4G<br>


<b>Technologie:</b>&nbsp;
    Tab Mount<br>



<b>Bauform:</b>&nbsp;
    D2PAK-3<br>



<b>Verpackungsart:</b>&nbsp;
    REEL<br>



<b>Standard Verpackungseinheit:</b>&nbsp;
    800<br>

5
  • 5
    by offering up your sanity to Cthulhu Commented May 8, 2012 at 16:38
  • iow, use an HTML parser. Commented May 8, 2012 at 16:42
  • Please refrain from parsing HTML with RegEx as it will drive you insane. Use an HTML parser instead. Commented May 8, 2012 at 17:00
  • What is your constraint? The second line after <p class="ref">? Something which starts at beginning of lińe with an uppercase letter? Something which ends with 4G? Something, 3 lines before Technologie? Commented May 8, 2012 at 17:37
  • basically before the String I want there is the </b> and then the <br> ... so its </b>STRING<br> but there is a line break between the </b> in the html, is that relevant? Commented May 8, 2012 at 17:41

1 Answer 1

3

As Wrikken correctly says, HTML can't be parsed correctly by regex in the general case. However it seems you're looking at an actual website and want to scrape some contents. In that case, assuming space elements and formatting in the HTML code don't change, you can use a regex like this:

 Mfr Part#:</b>([^<]+)<br>

And collect the first capture group like so (where string is your HTML):

Pattern pt = Pattern.compile("Mfr Part#:</b>\s+([^<]+)<br>",Pattern.MULTILINE);
Matcher m = pt.matcher(string); 
if (m.matches())
    System.out.println(m.group(1)); 
Sign up to request clarification or add additional context in comments.

3 Comments

Pattern pt = Pattern.compile("Mfr Part#:</b>([^<]+)<br>"; ? How would I get the string?
Matcher m = pt.matcher(string); if (m.matches()) System.out.println(m.group(1));
Element desc = doc.select("p[class=ref]").first(); logger.debug("found ref:"+ desc.text()); Pattern pt = Pattern.compile("Mfr Part#:</b>([^<]+)<br>"); Matcher m = pt.matcher(desc.text()); if (m.matches()){ logger.debug("found partnumber:"+ m.group(1)); article.setManufacturerArticleNumber(m.group(1)); article.setDistributorArticleNumber(m.group(1)); }

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.