1

I've got a HTML code stored in string and I want to extract all parts that match the pattern, which is:

<a href="http://abc.pl/(.*?)/(.*?)"><img src="(.*?)"

(.*?) stands for any string. I've tried dozens of combinations and couldn't get it working. Can somebody show me a sample code, which extracts all matched data from a String and store it in variables?

Thanks in advance

4
  • Please give an example of what output you expect. Commented Sep 9, 2011 at 11:42
  • An example showing how to loop through each (.*?) for every match will do, I can handle it from there. Commented Sep 9, 2011 at 11:44
  • But is this in JavaScript or in Java (in an Android app)? Commented Sep 9, 2011 at 12:11
  • Java. You write in Java for Android. Commented Sep 9, 2011 at 12:33

2 Answers 2

2

Here is a solution using JavaScript. I hope this helps.

First, we need a working pattern:

var pattern = '<a href="http://abc.pl/([^/"]+)/([^/"]*)".*?><img src="([^"]*)"';

Now, the problem is that in JavaScript there is no native method or function that retrieves both all matches and all submatches at once, whatever the regexp we use.

We can easily retrieve an array of all the full matches:

var re = new RegExp(pattern, "g");
var matches = yourHtmlString.match(re);

But we also want the submatches, right? In my humble opinion, the simplest way to achieve this is to apply the non-greedy version of the same regexp to each match we obtained (because only non-greedy regexes can return submatches):

var reNonGreedy = new RegExp(pattern);
var matchesAndSubmatches = [];
for(var i = 0; i < matches.length; i++) {
    matchesAndSubmatches[i] = matches[i].match(reNonGreedy);
}

Each element of matchesAndSubmatches is now an array such that:

matchesAndSubmatches[n][0] is the n-th full match,
matchesAndSubmatches[n][1] is the first submatch of the n-th full match, matchesAndSubmatches[n][2] is the second submatch of the n-th full match, and so on.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 for detailed answer, but I do know JavaScript and I know how the regular expression should look like. The problem is with Java part.
0

Well, here's the sample:

Pattern pattern = Pattern.compile("patternGoesHere");
Matcher matcher = pattern.matcher(textGoesHere);
while (matcher.find())
{
    // You can access substring here via matcher.group(substringIndex) [note they are indexed from 1, not 0]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.