Replace previous and next index of a substring in a string from a specific index

Question

Say, I have a String:

String someString = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";

In this String the position of the "Content" is known.

Now, I want to turn the most inner divs into span tags. So what I want to do:

someString.replacePreviousOccurrence(someString.indexOf("Content"), "<div ", "<span>");
someString.replaceNextOccurrence(someString.indexOf("Content"), "</div>", "</span>");

Is there something in Java to do this? Or just to get the index of a previous and next occurrence of a substring from a specified index?

Edit: forgot to specify the divs have unknown tags (may have classes and stuff) and there may be stuff in between (like the tag in the example).

@assylias Sorry, forgot to specify the divs have unknown tags and there may be unknown tags in between the "content" and the "divs" — Simon Baars
– Simon Baars, Commented Apr 18, 2017 at 17:02
It sounds like you may be better off parsing the html than trying to do string replacement... — assylias
– assylias, Commented Apr 18, 2017 at 17:15

gwcoderguy · Accepted Answer · 2017-04-18 17:40:03Z

1

You can definitely do this with regex, though it may not be the most elegant solution. Here is the pattern you might use: <div>(?!<div>).*(?<!<\/div>)<\/div>

This works by using negative lookahead and negative lookbehind. Negative lookahead here: (?!<div>) says find this pattern where this is not followed by "<div>" and the negative lookbehind here: (?<!<\/div>) says find this pattern where it is not preceded by </div>

So the pattern broken down:

<div>   //matches <div>
    (?!<div>) //that isn't followed by <div>
           .* //followed by any character any number of times
    (?<!<\/div>) // Where the next match isn't preceded by <div>
<\/div>    //matches </div>

So for this problem you can do something like the following:

String str = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";
Pattern p = "<div>(?!<div>).*(?<!<\/div>)<\/div>";
Matcher m = p.matcher(str);
String output = m.replaceAll("<div>", "<span>").replaceAll("</div>", "</span>");

edited Apr 18, 2017 at 17:40

answered Apr 18, 2017 at 17:16

gwcoderguy

4322 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Simon Baars Over a year ago

A great solution for my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between.

gwcoderguy Over a year ago

Here's an interesting approach you can try: stackoverflow.com/questions/19035893/… Basically, you can utilize the indexOf() method utilizing the index of where you'd like to begin your search. You can use this to get both the previous and next occurrences. Though I agree it would be a nice function for them to include!

Simon Baars Over a year ago

Looks awesome @gwcoderguy. I did know of that function, but didn't see how to get the previous occurrence. Would you mind explaining how?

gwcoderguy Over a year ago

Sorry, I was hoping that there would be a method with a toIndex parameter... Either way you can do something like: str.substring(0, str.indexOf(targetString) - targetString.length).lastIndexOf(targetString); A little ugly though...

Simon Baars Over a year ago

Still a very stable alternative. Thanks for these insights.

Raudbjorn · Accepted Answer · 2017-04-18 17:32:23Z

1

You could use the built-in functionality for working with xml.

This is however, sadly, very verbose -but works.

 public static void replaceDivWithSpamByText() throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, TransformerException {
        String html = "<html><body><div><div><div>Content</div></div></div></body></html>";
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)));

        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xpath = xPathFactory.newXPath();
        Node contentNode = (Node) xpath.evaluate(".//div[text() = 'Content']", doc, XPathConstants.NODE);
        doc.renameNode(contentNode, null, "span");


        DOMSource domSource = new DOMSource(doc);
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.transform(domSource, result);

        System.out.println(writer.toString()); 
    }

Note that in this example I use Xpath to select the node by text(".//div[text() = 'Content']"), selecting by id, class, or other attributes is very easy. But writing a generic class to handle this could be a good idea if you're doing this kind of replacements a lot.

answered Apr 18, 2017 at 17:32

Raudbjorn

4802 silver badges8 bronze badges

1 Comment

Simon Baars Over a year ago

For this issue, this solves my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between.

Collectives™ on Stack Overflow

Replace previous and next index of a substring in a string from a specific index

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related