1

Im looking for a Regex Pattern to verify that my HTML-Input has the right structure and (probably in a second step) extract some information from it.

Example Inputtext:

<title>Example Title</title><br />
<link>Download:</link> <a href="URL">hier</a> | hoster1 <br />
<link>Download:</link> <a href="URL">hier</a> | hoster2 <br />
<link>Download:</link> <a href="URL">hier</a> | hoster3

Title, hoster and URL of course can change and are interesting to catch, so my attempt was something like this:

<title>([^<]+?)</title><br />\s<link>Download:</link> <a href="([^"]+?)">hier</a> \| ([^<]+?)<br />\s

These Groups might seem a bit silly, but I also tried (.*?) and even with lazy-mode he would just match whole lines.

  1. Right now the second part (< link > part) will match, but not in combination with the < title > one. I'm guessing my whitespace character (\s) doesnt match a new line? How can I check ONLY for a newline character?

  2. The number of available links is dynamic, so i have no idea how many < link > tags there are. How can I use the second half of the pattern as a repeatable pattern? Id like to do something like this (which obviously doesnt work that way):

    [ <link>Download:</link> <a href="([^"]+?)">hier</a> \| ([^<]+?)<br />\s ]*

This all is done with MULTILINE Option set (Althought im not too sure it is needed for what I want to do).

Im trying some different things for a few days now and am not getting anywhere, I'd really appreciate a few pointers into the right direction, thank you.

2 Answers 2

2

Use a proper HTML parser such as jsoup for this sort of task; regular expressions are fine for very simple cases but will quickly become unwieldy. An HTML parser will be much faster, easier, and more correct to implement, especially as you start doing more advanced testing.

Sign up to request clarification or add additional context in comments.

Comments

0

Just add [^\r\n] wherever you need a new line char for Windows else use [^\n].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.