I've got a regular expression that match everything between <anything> and I'm using this:
'@<([\w]+)>@'
today but I believe that there might be a better way to do it?
/ Tobias
\w doesn't match everything like you said, by the way, just [a-zA-Z0-9_]. Assuming you were using "everything" in a loose manner and \w is what you want, you don't need square brackets around the \w. Otherwise it's fine.
\w is locale dependent, so it will match 'unexpected' characters, depending on your locale settings.You better use PHP string functions for this task. It will be a lot faster and not too complex.
For example:
$string = "abcd<xyz>ab<c>d";
$curr_offset = 0;
$matches = array();
$opening_tag_pos = strpos($string, '<', $curr_offset);
while($opening_tag_pos !== false)
{
$curr_offset = $opening_tag_pos;
$closing_tag_pos = strpos($string, '>', $curr_offset);
$matches[] = substr($string, $opening_tag_pos+1, ($closing_tag_pos-$opening_tag_pos-1));
$curr_offset = $closing_tag_pos;
$opening_tag_pos = strpos($string, '<', $curr_offset);
}
/*
$matches = Array ( [0] => xyz [1] => c )
*/
Of course, if you are trying to parse HTML or XML, use a XHTML parser instead
<div attr="my>value">as<div attr="my>, and not the whole tag like you want. RegEx does not do quote balancing. This is ok if you are doing something specific when you know the output, but bad if you are doing something generic.