2

I am a newbie to Java regex. I have a long string which contains text like this(Below is only the part of my string which I would like to replace):

href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)"

I would like to replace

Images

with

http://google.com/Images

For eg. my output should look like this:

href="javascript:openWin('http://google.com/Images/DCRMBex_01B_ex01.jpg',480,640)"

Below is my Java program:

import java.io.FileReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main2 {

    public static void main(String[] args) throws FileNotFoundException {

        Scanner in = new Scanner(new FileReader("C:\\Projects\\input.txt"));

        StringBuilder sb = new StringBuilder();
        while (in.hasNext()) {
            sb.append(in.next());
        }
        String patternString = "href=\"javascript:openWin(.+?)\"";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(sb);
        while (matcher.find()) {
            //System.out.println(matcher.group(1));
            //System.out.println(matcher.group(1).replaceAll("Images", "http://google.com/Images"));
            matcher.group(1).replaceAll("Images", "http://google.com/Images");

        }
        System.out.println(sb);
    }
}

Below is my input file(input.txt). This is only a part of my file. The file is too long to paste here:

 <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_01_ex01.pdf"><b>Example 1: Bible (Rusch)</b></a> � <a href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)">Figure 1A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)">Figure 1B. Source of supplied title</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_06_ex06.pdf"><b>Example 6: Angelo Carletti</b></a> � <a href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)">Figure 6A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_06B_ex06.jpg',480,640)">Figure 6B. Colophon showing use of i/j and u/v</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_02_ex02.pdf"><b>Example 2: Greek anthology</b></a> � <a href="javascript:openWin('Images/DCRMBex_02A_ex02.jpg',480,640)">Figure 2A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_02B_ex02.jpg',480,640)">Figure 2B. Colophon</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_07_ex07.pdf"><b>Example 7: Erasmus</b></a> � <a href="javascript:openWin('Images/DCRMBex_07A_ex07.jpg',480,640)">Figure 7A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_07B_ex07.jpg',480,640)">Figure 7B. Colophon</a> � <a href="javascript:openWin('Images/DCRMBex_07C_ex07.jpg',640,480)">Figure 7C. Running title</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_03_ex03.pdf"><b>Example 3: Heytesbury</b></a> � <a href="javascript:openWin('Images/DCRMBex_03A_ex03.jpg',480,640)">Figure 3A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_03B_ex03.jpg',480,640)">Figure 3B. Colophon showing use of i/j and u/v</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_08_ex08.pdf"><b>Example 8: Pliny</b></a> � <a href="javascript:openWin('Images/DCRMBex_08A_ex08.jpg',480,640)">Figure 8A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_08B_ex08.jpg',480,640)">Figure 8B. Colophon</a></td>

Output:

1) System.out.println(matcher.group(1))

('Images/DCRMBex_05_ex05.jpg',480,640)

2)System.out.println(matcher.group(1).replaceAll("Images","http://google.com/Images"));

 ('http://google.com/Images/DCRMBex_05_ex05.jpg',480,640)

But when I print my struingbuilder, it doesn't show any replacement. What I am doing wrong here? Any help is appreciated. Thanks

2
  • replaceAll does not modify in place. It returns the modified value. Commented May 30, 2019 at 15:54
  • @BenjaminUrquhart What should I do in my case then? Commented May 30, 2019 at 15:55

2 Answers 2

2

I would recommend using Files.lines() and Java Steam to modify the input. With your actual input you also don't need a regex:

try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(line -> line.replace("Images", "http://google.com/Images"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

If you really want to use a regex I would recommend to use a pattern outside the loop, because String.replaceAll() internally compiles the pattern every time you call it. So the performance is much better if you do not do Pattern.compile() for each line:

Pattern pattern = Pattern.compile("(href=\"javascript:openWin.*)(Images.*\")");
try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(pattern::matcher)
            .map(matcher -> matcher.replaceAll("$1http://google.com/$2"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

Using this regex for replacement it will create two groups (between ()). You can use this groups in your replacement string by using $index. So $1 will insert the first group.

The result in both cases will be:

href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01B_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01A_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_06A_ex06.jpg&amp;#39;,480,640)"
Sign up to request clarification or add additional context in comments.

Comments

2

replaceAll returns the modified string; it does not modify in place. In this case, I would not use java.util.regex and instead use replaceAll's support for capture groups:

Scanner in = new Scanner(new FileReader("C:\\Projects\\input.txt"));
StringBuilder sb = new StringBuilder();
while (in.hasNext()) {
    sb.append(in.next());
}
// Modified regex 
String patternString = "(href=\"javascript:openWin\\(&amp;#39;)(.+?)(&amp;#39;)";

String result = sb.toString().replaceAll(patternString, "$1http://google.com/$2$3");

Try it online

Hope this helps!

2 Comments

Awesome. It worked. Could you also paste the link for information about $1, $2 $3? That would be helpful. Thanks
Here you go - replaceAll uses Matcher internally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.