Im looking for a Regex Pattern to verify that my HTML-Input has the right structure and (probably in a second step) extract some information from it.
Example Inputtext:
<title>Example Title</title><br />
<link>Download:</link> <a href="URL">hier</a> | hoster1 <br />
<link>Download:</link> <a href="URL">hier</a> | hoster2 <br />
<link>Download:</link> <a href="URL">hier</a> | hoster3
Title, hoster and URL of course can change and are interesting to catch, so my attempt was something like this:
<title>([^<]+?)</title><br />\s<link>Download:</link> <a href="([^"]+?)">hier</a> \| ([^<]+?)<br />\s
These Groups might seem a bit silly, but I also tried (.*?) and even with lazy-mode he would just match whole lines.
Right now the second part (< link > part) will match, but not in combination with the < title > one. I'm guessing my whitespace character (\s) doesnt match a new line? How can I check ONLY for a newline character?
The number of available links is dynamic, so i have no idea how many < link > tags there are. How can I use the second half of the pattern as a repeatable pattern? Id like to do something like this (which obviously doesnt work that way):
[ <link>Download:</link> <a href="([^"]+?)">hier</a> \| ([^<]+?)<br />\s ]*
This all is done with MULTILINE Option set (Althought im not too sure it is needed for what I want to do).
Im trying some different things for a few days now and am not getting anywhere, I'd really appreciate a few pointers into the right direction, thank you.