Before I explain what you did wrong and how you can correct it here is one very simple way how you can solve your problem with just one regular expression. Idea is to let regex try to match first ** and if this will be not possible then try to match only *. Such regex can look like
\\*\\*|\\*
because matcher test options of OR from left to right, so in case of data like
***
matcher will find first try to find match for \\*\\* and it will succeed so it will consume first two asterixes **
***
^^
After this matcher will go forward and again will try to check if \\*\\* can be matched here, but since this time there is only one *, \\*\\* wouldn't be matched so matcher will try to test other option in regex which is \\*. So this time matcher will return only one *.
***
^
And so on.
Code for such application can look like
String data = "*..***...**...****....*....*****..**";
Pattern p = Pattern.compile("\\*\\*|\\*");
Matcher m = p.matcher(data);
int tmp1 = 0, tmp2 = 0;
while (m.find()) {
if (m.group().length() == 1)//found *
tmp1++;
else //found **
tmp2++;
}
System.out.println(tmp1);
System.out.println(tmp2);
Output:
4
7
Now lets focus on your current regexes.
oneStarPattern problems
Your first regex (^|\\.|(\\*{2})+)\\*(\\.|$) accepts only one * which have
before it, and
after it.
Strategy which accept * as long as it has even numbers of * before it and . or $ after it has one flaw, because in case
****.
^^^^
part marked with ^ will also be matched (while it shouldn't).
This is why this regex matches data marked with ^ and # where marked with # is not supposed to be there:
*..***...**...****....*....*****..**
^^ ^^^^ #### ^^^ ^^^^^^
and you are seeing 5 matches.
Another possible problem is your regex consumes surrounding elements so they can't be reused in next try to find next matches, so in case of
*.*.
^^
first *. will be matched, but . will be included in this match which prevents regex in using it while testing second *.. Because second *. can't include first . (used in previous match) in its match regex will not be correct, because * has no ^, (\\*{2})+), or free to use . before it.
So in reality even . aren't supposed to be included in match
*..***...**...****....*....*****..**
^# ^^^# #### #^# ^^^^^#
solution
To get rid of these problems you can use look-around mechanisms and change your regex to something like
"(?<=^|\\.)(\\*{2})*\\*(?=\\.|$)"
This regex will find
- only odd numbers of
* ((\\*{2})*\\*)
- if they have
- start of string or
. before it (?<=^|\\.)
. or end of string after it (?=\\.|$)
twoStarsPattern
(^|\\.|(\\*{2})+)\\*{2}(\\.|\\*|$)
This regex have similar problems as first one. Lets see what it currently matches
*..***...**...****....*....*****..**
^^^^ ^^^^ ^^^^ ^^^^ ^^^
There is something wrong with each match because
- again it includes
.
- but this time it also includes additional
* at the end, preventing next match from using it
- this regex
(^|\\.|(\\*{2})+)\\*{2} will search for maximal possible even number of asterixes (because of (\\*{2})+), not in one pair
solution
This regex is very good example of overcomplicating things. It may seems little harder to fix than first one but in reality it is very simple.
You just need to use \\*\\* regex. It will match only pairs of asterixes, return each of them and look for the next one. This regex is safe, because you can't reuse already matched **, so it will match
*********
11223344x
where 1 2 3 4 represents what will be returned in each iteration of match, and * corresponding to x will not be matched at all.
*and.toaandb. And then consider using*..*****where is 2x**and 1x*. And I don't know how to find it programmatically.