0

Say, I have a String:

String someString = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";

In this String the position of the "Content" is known.

Now, I want to turn the most inner divs into span tags. So what I want to do:

someString.replacePreviousOccurrence(someString.indexOf("Content"), "<div ", "<span>");
someString.replaceNextOccurrence(someString.indexOf("Content"), "</div>", "</span>");

Is there something in Java to do this? Or just to get the index of a previous and next occurrence of a substring from a specified index?

Edit: forgot to specify the divs have unknown tags (may have classes and stuff) and there may be stuff in between (like the tag in the example).

3
  • s.replace("<div>Content</div>", "<span>Content</span>") ? Commented Apr 18, 2017 at 16:55
  • @assylias Sorry, forgot to specify the divs have unknown tags and there may be unknown tags in between the "content" and the "divs" Commented Apr 18, 2017 at 17:02
  • It sounds like you may be better off parsing the html than trying to do string replacement... Commented Apr 18, 2017 at 17:15

2 Answers 2

1

You can definitely do this with regex, though it may not be the most elegant solution. Here is the pattern you might use: <div>(?!<div>).*(?<!<\/div>)<\/div>

This works by using negative lookahead and negative lookbehind. Negative lookahead here: (?!<div>) says find this pattern where this is not followed by "<div>" and the negative lookbehind here: (?<!<\/div>) says find this pattern where it is not preceded by </div>

So the pattern broken down:

<div>   //matches <div>
    (?!<div>) //that isn't followed by <div>
           .* //followed by any character any number of times
    (?<!<\/div>) // Where the next match isn't preceded by <div>
<\/div>    //matches </div>

So for this problem you can do something like the following:

String str = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";
Pattern p = "<div>(?!<div>).*(?<!<\/div>)<\/div>";
Matcher m = p.matcher(str);
String output = m.replaceAll("<div>", "<span>").replaceAll("</div>", "</span>");
Sign up to request clarification or add additional context in comments.

5 Comments

A great solution for my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between.
Here's an interesting approach you can try: stackoverflow.com/questions/19035893/… Basically, you can utilize the indexOf() method utilizing the index of where you'd like to begin your search. You can use this to get both the previous and next occurrences. Though I agree it would be a nice function for them to include!
Looks awesome @gwcoderguy. I did know of that function, but didn't see how to get the previous occurrence. Would you mind explaining how?
Sorry, I was hoping that there would be a method with a toIndex parameter... Either way you can do something like: str.substring(0, str.indexOf(targetString) - targetString.length).lastIndexOf(targetString); A little ugly though...
Still a very stable alternative. Thanks for these insights.
1

You could use the built-in functionality for working with xml.

This is however, sadly, very verbose -but works.

 public static void replaceDivWithSpamByText() throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, TransformerException {
        String html = "<html><body><div><div><div>Content</div></div></div></body></html>";
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)));

        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xpath = xPathFactory.newXPath();
        Node contentNode = (Node) xpath.evaluate(".//div[text() = 'Content']", doc, XPathConstants.NODE);
        doc.renameNode(contentNode, null, "span");


        DOMSource domSource = new DOMSource(doc);
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.transform(domSource, result);

        System.out.println(writer.toString()); 
    }

Note that in this example I use Xpath to select the node by text(".//div[text() = 'Content']"), selecting by id, class, or other attributes is very easy. But writing a generic class to handle this could be a good idea if you're doing this kind of replacements a lot.

1 Comment

For this issue, this solves my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.