Detecting variables through a string

Question

I am creating a simple IDE using JTextPane and detecting keywords and coloring them.

Currently, I am able to detect:

Comments
String Literals
Integers & Floats
Keywords

The way I detect these types are through regular expressions.

Now, I am trying to detect variables like [int x = 10;] and coloring them a different color.

Currently, I am able to get all data types like int, float char using the following regex:

Pattern words = Pattern.compile(\\bint\\b|\\bfloat\\b\\bchar\\b);
Matcher matcherWords = words.matcher(code);
while (matcherWords.find()) {
    System.out.print(code.substring(matcherWords.start(), matcherWords.end());
    // How to get next word that is a variable?
}

Below is a sample output of my program:

enter image description here

How am I able to detect variables like a, b, c after I can detect int, float, etc?

May be you can use the regular expression for that.. like setting all the rules for a valid variable name and get the data till operator or semicolon appears. — HitchHiker
– HitchHiker, Commented Jul 30, 2015 at 18:37
Could you not detect when they're initialized via datatype and store the following word? — CubeJockey
– CubeJockey, Commented Jul 30, 2015 at 18:38
I see, sorry but im very very new to regex and i have no clue how to do such complex regex. — user3188291
– user3188291, Commented Jul 30, 2015 at 18:39
@Trobbins YESYES thats what i am trying to achieve. but i do not know how to get the "following word" — user3188291
– user3188291, Commented Jul 30, 2015 at 18:40

m.cekiera · Accepted Answer · 2015-07-31 10:42:44Z

3

Try this one:

(?:(?<=int|float|String|double|char|long)(?:\s+[a-zA-Z_$][\w$]*\s*)|(?<=\G,)(?:\s*[a-zA-Z_$][\w$]*\s*))(?=,|;|=)

which means:

(?<=int|float|String|double|char|long) - positive lookbehind searching for variable type,
(?:\s+[a-zA-Z_$][\w$]*\s*) - non capturing group: at least one space, followed by valid characters for Java variables, followed by zero or more spaces
| - or; alternative between maching name after var. type or after comma,
(?<=\G,) - positive lookbehind for previous match and comma (because other parts match spaces from both sides)
(?:\s*[a-zA-Z_$][\w$]*\s*) - non capturing group: at least one space, followed by valid characters for Java variables, followed by zero or more spaces
(?=,|;|=) - positive lookahead for comma, equal sign or semi-colon

it use a \G boundary matching (The end of the previous match), so the alternative, which search names between other names (words beetween spaces or/and commas exactly), will match only if it is after previous match. So it will not match every word beetween commas in Strings for example. Also I added $ in [a-zA-Z_$][\w$]* as it is allowed in variable names however not recommended.

DEMO

And for Java:

 Pattern pattern = Pattern.compile("(?:(?<=int|float|String|double|char|long)(?:\\s+[a-zA-Z_$][\\w$]*\\s*)|(?<=\\G,)(?:\\s*[a-zA-Z_$][\\w$]*\\s*))(?=,|;|=)");

EDIT

You can use (int |float |...) to match names of variables directly using matcher.start() and matcher.end() without spaces, however I would rather use (?:\s*) in every place where space can ocour and then check for redundant spaces during data process, because you never know how much spaces will user type (of course more than one is redundant, but it is still valid!).

Another approuch would be to match spaces but use groups, like:

(?:(?<=int|float|String|double|char|long)(?:\s+)([a-zA-Z_$][\w$]*)(?:\s*)|(?<=\G,)(?:\s*)([a-zA-Z_$][\w$]*)(?:\s*))(?=,|;|=)

DEMO

names are without spaces, but you need to extract them from groups 1 & 2 by matcher.start(group no) and matcher.end(group no).

EDIT2 answer to question from comment

It depends what you want to achieve. If you just want to get variables as Strings, it is enough to use mathod trim() but if you want to get start and end indices of variables in text, to for example highlight it in different colour, it will be better to use for example matcher.start(1) to extract start index of group 1. Consider this example:

import java.io.IOException; import java.util.regex.Matcher; import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) throws IOException {
        String      text = "int a = 100;\n" +
                "float b = 100.10;\n" +
                "double c - 12.454545645;\n" +
                "long longest dsfsf = 453543543543;\n" +
                "a = d;\n" +
                "char     b = 'a';\n" +
                "String str = \"dfssffdsdfsd\"\n" +
                "int d,f,g;\n" +
                "int a,f,frhg = 0;\n" +
                "String string = \"a,b,c,d,e,f\"";

        Pattern pattern = Pattern.compile("(?:(?<=int|float|String|double|char|long)(?:\\s+)([a-zA-Z_$][\\w$]*)(?:\\s*)|(?<=\\G,)(?:\\s*)([a-zA-Z_$][\\w$]*)(?:\\s*))(?=,|;|=)");
        Matcher matcher = pattern.matcher(text);
        while(matcher.find()){
            System.out.println("trim(): " + text.substring(matcher.start(),matcher.end()).trim()); // cut off spaces by trim() method;

            int group = (matcher.group(1)==null)? 2 : 1; // check which group captured string;
            System.out.println("group(" + group + "): \n\t"  // to extract string by group capturing;
                    + text.substring(matcher.start(group),matcher.end(group))
                    + ",\n\tsubstring(" + matcher.start(group) + "," + matcher.end(group)+")");

        }
    }
}

the output present two approches.

edited Jul 31, 2015 at 10:42

answered Jul 30, 2015 at 22:48

m.cekiera

5,3935 gold badges24 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user4227915 Over a year ago

Dude.. you are awesome !!

user3188291 Over a year ago

Thank you for your answer.. it seems like its working but could u explain how to extract the variable names without the spaces? For example, when i use matcher.group(), int a = 10 --> i get the variable " a". but what i want is just a. Thank you so much

user3188291 Over a year ago

what do u mean by " extract them from groups 1 & 2 by matcher.start(group no) and matcher.end(group no)."

m.cekiera Over a year ago

@user3188291 I updated my answer, ask freely If you have more questions

user3188291 Over a year ago

Thank you so much! it works perfectly. Cleared all my doubts.

|

Shar1er80 · Accepted Answer · 2015-07-31 00:18:46Z

3

Have you tried a lookbehind/lookahead pattern?

This ridiculously long pattern:

"(?<=int |float |String |double )([a-zA-Z_]\\w*)(?=,|;|\\s)|([a-zA-Z_]\\w*)(?=,|;|\\s*=)"

Is able to parse out variables and comma separated variables.

public static void main(String[] args) throws Exception {
    String javaCode = "int a = 100;\n" + 
            "float b = 110;\n" + 
            "String c = \"Hello World\";" +
            "double d, e, f, g = 1.0, h;";

    Matcher matcher = Pattern
            .compile("(?<=int |float |String |double )([a-zA-Z_]\\w*)(?=,|;|\\s)|([a-zA-Z_]\\w*)(?=,|;|\\s*=)")
            .matcher(javaCode);
    while (matcher.find()) {
        System.out.println(matcher.group());
    }
}

Results:

a
b
c
d
e
f
g
h

Also tested here @ regex101

edited Jul 31, 2015 at 0:18

answered Jul 30, 2015 at 19:10

Shar1er80

9,0512 gold badges24 silver badges31 bronze badges

5 Comments

user3188291 Over a year ago

this doesnt work on multiple variables declaration like int a,b,c;. But thank you very much for your answer though..

Shar1er80 Over a year ago

@WashingtonGuedes Updated answer will not pick up the function name. Still looking into comma delimited variables.

user3188291 Over a year ago

Thank you for the update.. Any luck with comma delimited variables though?

Shar1er80 Over a year ago

@user3188291 Right now, someone who's better at regex than I am will have to speak for the comma delimited variables. Otherwise I'd suggest detecting a variable declaration line and split out the variable names by the comma. I'll look at this more later. Excellent question posted!

user3188291 Over a year ago

Thanks for the update, but it is not working 100% consistently like boolean b = true will detect true as a variable name as well.. But your help is very much appreciated thanks.

zolo · Accepted Answer · 2015-07-30 18:56:49Z

0

\b(?:int|float|String|char|double|long)\b\s+([^=;]+)

Did you tried to match only the variable name? If yes then the above one will help.

answered Jul 30, 2015 at 18:56

zolo

4792 silver badges6 bronze badges

4 Comments

user3188291 Over a year ago

Ahh,, this Almost works... currently, this regex matches "int a".. but what im trying to get is only a

zolo Over a year ago

check the latest version as I've edited few times. And check the right side the match section. regex101.com/r/xQ5fK9/1

user3188291 Over a year ago

Thank you for your answer, in the regex101 website, it matches the variable name. But when i tried matching it in my program using matcher.start(), matcher.end(), i get int a, float b... but what i wanted is only the variable name a & b... Is it a wrong way to use matcher.start() and matcher.end()?

zolo Over a year ago

Collectives™ on Stack Overflow

Detecting variables through a string

3 Answers 3

6 Comments

5 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related