-1

I know that there have been variations of questions answered here

I have tried to go through the solutions and come up with a regular expression for my needs. I have a string of text over multiple lines with neither a fixed starting location nor an ending location for a particular line.

<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.

To move through submenu items press tab and then press up or down arrow.</span> </a>
<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>
Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.

To move through submenu items press tab and then press up or down arrow.</span> </a>
<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>
Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.
To move through submenu items press tab and then press up or down arrow.</span> </a>

I would like to extract the following the contents from javascript:goto(&quot;link&quot;) (what ever link value represents) There are multiple such occurrences in the above regex, but the regex that I am using returns just a single occurrence. I would like to return all of it. My code block is given below

private static final Pattern PATTERN_WITH_ASCII_QUOTES =
    Pattern.compile("^.*goto\\(&#39;(\\w+)&#39;\\).*",
        Pattern.MULTILINE|Pattern.DOTALL);

// "str" is the string representation of the text above.
Matcher m = PATTERN_WITH_ASCII_QUOTES.matcher(str);
while (m.find()) {
    System.out.println(m.group(1));
}

The resultant output is always findmyinfo and nothing else.

UPDATE - The desired outputs are

 billpay (from javascript:goto(&#39;billpay&#39;);)
 findmyinfo (from javascript:goto(&#39;findmyinfo&#39;);)

I would also like to to extract

/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage from OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;)
1
  • 2
    What's your expected output? Commented Sep 7, 2014 at 7:28

3 Answers 3

1

You need to add OLLPopUp and goto into a non-capturing group in-order to get the first, second and third values.

 ^.*?(?:goto|OOLPopUp)\(&#39;(.*?)&#39;\).*

DEMO

String s = "<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.\n" + 
        "To move through submenu items press tab and then press up or down arrow.</span> </a>\n" +
        "<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>\n" +
        "<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>\n" +
        "Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.";
Pattern regex = Pattern.compile("^.*?(?:goto|OOLPopUp)\\(&#39;(.*?)&#39;\\).*", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(1));
}

Output:

billpay
findmyinfo
/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage

OR

String s = "<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.\n" + 
        "To move through submenu items press tab and then press up or down arrow.</span> </a>\n" +
        "<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>\n" +
        "<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>\n" +
        "Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.";
Pattern regex = Pattern.compile("^(?:.*?goto\\(&#39;(\\w+)&#39;\\).*|.*?OOLPopUp\\(&#39;(.+?&#39;\\)).*)$", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(1) != null ?
                matcher.group(1) : matcher.group(2)
                );
}

Output:

billpay
findmyinfo
/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;)

IDEONE

Sign up to request clarification or add additional context in comments.

5 Comments

I have another clarification and I hope you don't mind. I have another set of urls such as <a name='bill_pay' href='javascript:goto(\'billpay\');'>Bill Pay</a>, replacing &#39; with \'. I have tried to reverse engineer your regex, but nothing seems to be working. I get IndexOutOfBoundsException for every variation that I try. How can I add that as well?
Yes. That is precisely the effect I want. Although, we use single quotes in our HTML document, so we have a set up like <a name='bill_pay' href='javascript:goto(\'billpay\');'>Bill Pay</a>. I would like to be able to extract the value from here as well.
I am not able to put that in regex though. Pattern.compile("^.*?(?:goto|OOLPopUp)\(&#39;|'(.*?)&#39;|'\).*", Pattern.MULTILINE); returns an ArrayOutOfBoundsException.
replace single backslash with double backslash
Where do I replace single backslash with doubleslash? Pattern.compile("^.*?(?:goto|OOLPopUp)\(&#39;|\'(.*?)&#39;|\'\).*", Pattern.MULTILINE); throw ArrayOutOfBoundsException
1

You are always taking the group(1) that is the probem. Use

while (m.find()) {
    System.out.println(m.group());
}

1 Comment

No text is printed. The first entry is the entire string and then nothing. I don't get the extracted strings.
0

There is a problem with your pattern. Try this:

Pattern.compile("goto\\(&#39;(\\w+)&#39;\\)",
                    Pattern.MULTILINE|Pattern.DOTALL);

Also in printing the result, you can try :

System.out.println(m.group(1) + " ( from " + str.substring(m.toMatchResult().start(), m.toMatchResult().end()) + " )");

the output is like this:

billpay (from goto(&#39;billpay&#39;);)
findmyinfo (from goto(&#39;findmyinfo&#39;);)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.