-1

I have a String like this

String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO ;

I want extract strings between and and construct a StringBuilder with all parts of the string in right order. I do this because i need to identify and localize the strings extracted but i need to keep the entire string too. The purpose for all this work is to add later the entire String in a excel sheet cell and add font for the string between

XSSFRichTextString xssfrt = new XSSFRichTextString(); // acts like a StringBuilder
    xssfrt .append("AZERTY");
    xssfrt .append("ZA" , font); //extract 1
    xssfrt .append(" QWERTY OK "); // keep spaces
    xssfrt .append("NE" , font); //extract 2
    xssfrt .append("NO");

There is my regex which can extract the desired strings but i don't know how to construct the StringBuilder with all parts in right order :/

Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
            Matcher m = p.matcher(value);
            while(m.find())
            {
                m.group(1); //extracts
            }

Thank you very much

4 Answers 4

2

An easy fix is too add another group to match a string before <em>:

Pattern p = Pattern.compile("(.*?)<em>(.*?)</em>");

With it, m.group(1) refers to the string outside em, and m.group(2) is the one inside.

Of course, this won't include the last string outside em (NO in your example). So, you might want to memorize the last index where the matching ends with e.g. int end = m.end(), and retrieve it s.substring(end).

Sign up to request clarification or add additional context in comments.

Comments

1

You can use Matcher's appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) function to keep it in order. And have a list which will store the extracted Strings. Something like this

public static void main(String[] args) throws java.lang.Exception {
    String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
    String matchedString = null;
    List<String> extractedString = new ArrayList<String>();
    Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
    Matcher m = p.matcher(s);
    StringBuffer sb = new StringBuffer();

    while (m.find()) {

        matchedString = m.group(1);
        extractedString.add(matchedString);
        m.appendReplacement(sb, matchedString);
        sb.append(" ");

    }
    m.appendTail(sb);

    System.out.println(sb.toString());
    System.out.println(extractedString.toString());
}
Output :
String buffer = AZERTYZA  QWERTY OK NE NO
List of extracted String = [ZA, NE]

3 Comments

Thank you for your answer but i need to construct a string builder not a simple string because i need to identify the strings extracted from other parts
@ulquiorra I have updated my answer please check whether it solves your problem. Only thing remains is there is no space between first word and first extracted string which I think you can handle. I have not used those functions extensively but I thought that is what you wanted.
Thanks . Works like a charm :)
0
String[] pieces = s.split("<.*?>")

This will split the string on anything surrounded by <>. If your tag is always em, then you can use:

String[] pieces = s.split("</?em>")

1 Comment

Thanks . And how i can identify with split the strings between <em></em> from other parts ?
0

You need to do something like as :

        String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
        StringBuilder stringBuilder = new StringBuilder();
        String[] parts = str.split("(<\\/?em>)");

        System.out.println("parts : "+Arrays.toString(parts));

        for(String s:parts){
            System.out.println("Part going to append :"+s);
            stringBuilder.append(s);
        }
        System.out.println("StringBuilder : "+stringBuilder.toString());
    }

Out put will be:

> parts : [AZERTY, ZA,  QWERTY OK , NE, NO] Part going to append :AZERTY
> Part going to append :ZA Part going to append : QWERTY OK  Part going
> to append :NE Part going to append :NO StringBuilder : AZERTYZA QWERTY
> OK NENO

UPDATES :--

Check the updated code:

String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";

        //replace word in string which is preceded by <\em> to word:font eg. ZA:font
        str = str.replaceAll("(\\w+)(?=\\<\\/em\\>)", "$1:font");
   // After replace :AZERTY<em>ZA:font</em> QWERTY OK <em>NE:font</em>NO

        String[] parts = str.split("(<\\/?em>)");
 // After split : [AZERTY, ZA:font,  QWERTY OK , NE:font, NO]   

        XSSFRichTextString xssfrt = new XSSFRichTextString();

        for(String s:parts){
            //set font according to replace string
            if(s.contains(":")){
                String[] subParts = s.split(":");
                xssfrt.append(subParts[0], /**check the subParts[0] and set the font***/ );
            }else{
                xssfrt.append(s);
            }
        }
    }

1 Comment

Thank you but i need to identify "ZA" and "NE" in the StringBuilder . See my updated question thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.