java regex or other way for finding string between string and other parts of that string

Question

I have a String like this

String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO ;

I want extract strings between and and construct a StringBuilder with all parts of the string in right order. I do this because i need to identify and localize the strings extracted but i need to keep the entire string too. The purpose for all this work is to add later the entire String in a excel sheet cell and add font for the string between

XSSFRichTextString xssfrt = new XSSFRichTextString(); // acts like a StringBuilder
    xssfrt .append("AZERTY");
    xssfrt .append("ZA" , font); //extract 1
    xssfrt .append(" QWERTY OK "); // keep spaces
    xssfrt .append("NE" , font); //extract 2
    xssfrt .append("NO");

There is my regex which can extract the desired strings but i don't know how to construct the StringBuilder with all parts in right order :/

Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
            Matcher m = p.matcher(value);
            while(m.find())
            {
                m.group(1); //extracts
            }

Thank you very much

dejvuth · Accepted Answer · 2016-06-24 11:29:27Z

2

An easy fix is too add another group to match a string before <em>:

Pattern p = Pattern.compile("(.*?)<em>(.*?)</em>");

With it, m.group(1) refers to the string outside em, and m.group(2) is the one inside.

Of course, this won't include the last string outside em (NO in your example). So, you might want to memorize the last index where the matching ends with e.g. int end = m.end(), and retrieve it s.substring(end).

answered Jun 24, 2016 at 11:29

dejvuth

7,1743 gold badges37 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ravikumar · Accepted Answer · 2016-06-24 11:44:59Z

1

You can use Matcher's appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) function to keep it in order. And have a list which will store the extracted Strings. Something like this

public static void main(String[] args) throws java.lang.Exception {
    String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
    String matchedString = null;
    List<String> extractedString = new ArrayList<String>();
    Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
    Matcher m = p.matcher(s);
    StringBuffer sb = new StringBuffer();

    while (m.find()) {

        matchedString = m.group(1);
        extractedString.add(matchedString);
        m.appendReplacement(sb, matchedString);
        sb.append(" ");

    }
    m.appendTail(sb);

    System.out.println(sb.toString());
    System.out.println(extractedString.toString());
}
Output :
String buffer = AZERTYZA  QWERTY OK NE NO
List of extracted String = [ZA, NE]

edited Jun 24, 2016 at 11:44

answered Jun 24, 2016 at 10:45

Ravikumar

90112 silver badges22 bronze badges

3 Comments

ulquiorra Over a year ago

Thank you for your answer but i need to construct a string builder not a simple string because i need to identify the strings extracted from other parts

Ravikumar Over a year ago

@ulquiorra I have updated my answer please check whether it solves your problem. Only thing remains is there is no space between first word and first extracted string which I think you can handle. I have not used those functions extensively but I thought that is what you wanted.

ulquiorra Over a year ago

Thanks . Works like a charm :)

ernest_k · Accepted Answer · 2016-06-24 10:39:46Z

0

String[] pieces = s.split("<.*?>")

This will split the string on anything surrounded by <>. If your tag is always em, then you can use:

String[] pieces = s.split("</?em>")

answered Jun 24, 2016 at 10:39

ernest_k

45.5k5 gold badges58 silver badges107 bronze badges

1 Comment

ulquiorra Over a year ago

Thanks . And how i can identify with split the strings between <em></em> from other parts ?

Shekhar Khairnar · Accepted Answer · 2016-06-24 12:07:11Z

You need to do something like as :

        String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
        StringBuilder stringBuilder = new StringBuilder();
        String[] parts = str.split("(<\\/?em>)");

        System.out.println("parts : "+Arrays.toString(parts));

        for(String s:parts){
            System.out.println("Part going to append :"+s);
            stringBuilder.append(s);
        }
        System.out.println("StringBuilder : "+stringBuilder.toString());
    }

Out put will be:

> parts : [AZERTY, ZA,  QWERTY OK , NE, NO] Part going to append :AZERTY
> Part going to append :ZA Part going to append : QWERTY OK  Part going
> to append :NE Part going to append :NO StringBuilder : AZERTYZA QWERTY
> OK NENO

UPDATES :--

Check the updated code:

String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";

        //replace word in string which is preceded by <\em> to word:font eg. ZA:font
        str = str.replaceAll("(\\w+)(?=\\<\\/em\\>)", "$1:font");
   // After replace :AZERTY<em>ZA:font</em> QWERTY OK <em>NE:font</em>NO

        String[] parts = str.split("(<\\/?em>)");
 // After split : [AZERTY, ZA:font,  QWERTY OK , NE:font, NO]   

        XSSFRichTextString xssfrt = new XSSFRichTextString();

        for(String s:parts){
            //set font according to replace string
            if(s.contains(":")){
                String[] subParts = s.split(":");
                xssfrt.append(subParts[0], /**check the subParts[0] and set the font***/ );
            }else{
                xssfrt.append(s);
            }
        }
    }

Thank you but i need to identify "ZA" and "NE" in the StringBuilder . See my updated question thanks :)

Collectives™ on Stack Overflow

java regex or other way for finding string between string and other parts of that string

4 Answers 4

Comments

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related