Creating java regex to get href link

Question

Sorry if this has been asked before, but I couldn't find any answers on the web. I'm having a hard time figuring out the inverse to this regex:

"\"[^>]*\">"

I want to use replaceAll to replace everything except the link. So if I had a tag similar to this:

<p><a href="http://www.google.com">Google</a></p>

I need a regex that would satisfy this:

s.replaceAll(regex, "");

to give me this output:

http://www.google.com

I know there are better ways to do this, but I have to use a regex. Any help is really appreciated, thanks!

AlexR · Accepted Answer · 2011-11-29 08:44:57Z

16

You do not have to use replaceAll. Better use pattern groups like the following:

Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(html);
String url = null;
if (m.find()) {
    url = m.group(1); // this variable should contain the link URL
}

If you have several links into your HTML perform m.find() in loop.

answered Nov 29, 2011 at 8:44

AlexR

116k16 gold badges137 silver badges216 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1070866 Over a year ago

Thanks, it was hard for me to implement it because I was already using a pattern/matcher to find specific links that end in .htm and .html.

socha23 · Accepted Answer · 2011-11-29 08:45:09Z

0

If you always have one such link in a string, try this:

"(^[^\"]*\")|(\"[^\"]*)$"

answered Nov 29, 2011 at 8:45

socha23

10.3k3 gold badges31 silver badges25 bronze badges

1 Comment

user1070866 Over a year ago

This worked, but failed when the href tag had 'id=' before the link. I should've added that to my question, sorry.

somid3 · Accepted Answer · 2022-08-10 20:29:12Z

Use the method to get a map of all the properties of a HTML tag. Create a simple way to find all the properties of an HTML, like...

    Pattern linkPattern = Pattern.compile("<a (.*?)>");
    Matcher linkMatcher = linkPattern.matcher(in);
    while (linkMatcher.find()) { parseProperties(linkMatcher.group(1)).toString(); }

Get properties:

private static final Pattern PARSE_PATTERN = Pattern.compile("\\s*?(\\w*?)\\s*?=\\s*?\"(.*?)\"");

public static Map<String, String> parseProperties (String in) {

  Map<String, String> out = new HashMap<>();

  // Create matcher based on parsing pattern
  Matcher matcher = PARSE_PATTERN.matcher(in);

  // Populate map
  while (matcher.find()) { out.put(matcher.group(1), matcher.group(2)); }

  return out; 
}

kommradHomer · Accepted Answer · 2011-11-29 08:45:52Z

-1

you can checkout http://regexlib.com/ for all the regex help you need. And the one below is for url :

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$

answered Nov 29, 2011 at 8:45

kommradHomer

4,2295 gold badges55 silver badges72 bronze badges

4 Comments

spaaarky21 Over a year ago

The way it's currently written, that regex wouldn't work for site with country codes like winchester.us, amazon.co.uk, amazon.ca, etc.

kommradHomer Over a year ago

you are absolutely right. I've made a mistake by imposing my practice.

user1070866 Over a year ago

Also, doesn't work with Java 6.0, at least not in the replaceAll method.

kommradHomer Over a year ago

@user1070866, then that's the cherry on top for me.

Collectives™ on Stack Overflow

Creating java regex to get href link

4 Answers 4

1 Comment

1 Comment

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related