0

This is only for a small Android program I am messing with so I only need to match one or two tags

I have one HTML tag and I can get whats inside that tag which is "FC-Cologne" I use this code to get it

Pattern pattern = Pattern.compile("report\">(.*?)</a>",Pattern.MULTILINE);

here is the HTML tag I can get to work

<a href="/match-menu/3405570/first-team/fc-cologne=report"> FC Cologne</a>

But I can't get this tag, I don't know is it because of the space after the word "opposition" or/and the quotes inside the HTML tag, because they are not in the first tag

This is the one I can't get to work

<td class="bold opposition "> "Olympiacos" </td>

This is the code I am trying

Pattern pattern = Pattern.compile("opposition \">(.*?)</td>",Pattern.MULTILINE);

I have tried replacing the spaces " " with "" an empty string and I have tried \s where the space is but I get nothing.

I would appreciate if anyone could help me.

6
  • Could you clarify what the requirements of the regex please? Commented Sep 6, 2011 at 21:33
  • Related: stackoverflow.com/questions/1732348/… Commented Sep 6, 2011 at 21:34
  • @Tyler The requirement is to retrieve everything between the HTML tag < td class="bold opposition "> "Olympiacos" < /td> Commented Sep 6, 2011 at 21:41
  • Update your post with a small complete application we can C&P and test with. Might help solve the problem more precisely. Commented Sep 6, 2011 at 23:34
  • Ok I know how get the pattern to work with just a normal string by adding and extra \" before the quotes in sString = "<td class="+("bold opposition ")+(">Olympiacos</td>"); to get this sString = "<td class=\""+("bold opposition \"")+(">Olympiacos</td>"); I use sString = sString.replaceAll("\"", "\\\""); and that works for just the String Commented Sep 7, 2011 at 1:29

2 Answers 2

2

Unless you have a typo in one of the two - < /td> has a space after the < and in your regex </td> doesn't.

Adding a space to the regex after the < caused the match to succeed in RegexBuddy

Update: Seems the space is not in the tag the OP is working with.

In RegexBuddy I have the pattern (copied as a Java String)

"opposition \">(.*?)</td>"

which matches the html

< td class="bold opposition "> "Olympiacos"       </td>

giving a match of

opposition "> "Olympiacos"       </td>

and Group 1 of

 "Olympiacos"       <--Line ends there.
Sign up to request clarification or add additional context in comments.

8 Comments

That was only to stop the text editor on Stackoverflow to format right. The only space in the tag is after opposition and before the qoutes here compile("opposition \">
I agree with Nija, the regex works like this: "opposition \">(.*?)<\s/td>" See here: rubular.com/r/kZK5NR080L
Pattern pattern = Pattern.compile("opposition\\s\">(.*?)<\\s/td>",Pattern.MULTILINE); I have this in my Java program and it still does not work, should I be escaping the quotes around Olympiacos
@Steven_M: Updated my answer with more details.
I think its the white space between the tag, I have got it to work on this link here [link]rubular.com/r/aGjjOMAfHX, but it doesn't seem to work in Java I have tried "opposition\\s\">(\\s+.*?)</td>"
|
0

This is what you're looking for I believe.

<(\w+)\s*(?:\w+(?:=(?:'(?:[^']|(?<=\\)')*'|"(?:[^"]|(?<=\\)")*"))?\s*)*>(.*?)</\1\s*>

You will want to use the second group to get the contents of the tag (the first group is the tag name). Note that this does not work recursively. Nested elements are captured in the second group so you will need to use this regex on the second group of its match until there are no matches if that makes sense.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.