My regexp:
<([a-zA-Z0-9]+)>[\na-zA-Z0-9]*<\/\1+>
my string:
<div>
<f>
</f>
</div>
the result is:
array(2
0 => array(1
0 => <f>
</f>
)
1 => array(1
0 => f
)
)
why it is capturing <f></f>, and ignoring the first <div> ?
The answer is USE A PARSER INSTEAD (sorry for my shouting). While it is sometimes faster to use a regular expression to obtain an ID or URL string, html tags need a rather error-prone way of understanding via regex. Consider the following code, isn't that much more beautiful than druidic characters with special meanings?
<?php
$str = "
<container>
<div class='someclass' data='somedata'>
<f>some content here</f>
</div>
</container>";
$xml = simplexml_load_string($str);
echo $xml->div->f; // some content here
$attributes = $xml->div->attributes();
print_r($attributes); // class and data as keys
?>
<and>are not in your second character class.