1

I need to match something like this:

int a= 4, b, c = "hi";

I already made a regex that successfully strips everything away from the line, leaving only

a= 4, b, c = "hi"

I don't care about the types of the variables, like "hi" being a String, because that will be checked later in the code.

Basically, I need to match a variable declaration with everything stripped off except the variables themselves, with or without the = part.

These are examples that should not match:

a b= 4
var,
,hello=3
=8

I have checked this question out, it didn't really help.

I have tried this code, but there are a couple of problems, namely pretty much everything that I have listed in the things that shouldn't match, do match.

Also there might be more things that I missed. I am supposed to match strings with spaces, for example a = "hello there", and there isn't a requirement to match a string with , inside it.

"Formal" defenition of what a variable name can be:

Variable name can be any sequence (length > 0) of letters (uppercase or lowercase), digits and the underscore character. Name may not start with a digit. Name may start with an underscore, but in such a case it must contain at least one more character

Thanks for the help!

1
  • 1
    Not quite clear, pleas specify exactly what output you expect for a given input. I have tried to come up with something, but I am not sure what you need. See this demo. Commented Jun 1, 2016 at 18:21

1 Answer 1

2

Description

Taking from your regex101 example, I'm not exactly clear on the other requirements so I realize this may not completely answer your question.

"[^"]*"|((?=_[a-z_0-9]|[a-z])[a-z_0-9]+(?=\s*=))

Regular expression visualization

This regular expression will do the following:

  • matches quoted strings
  • places the variable names into Capture Group 1, you can then iterate through the array of matches testing the capture group 1 for a value, if it's populated then it's a name.
  • requires variable name to start with either _ and at least one character, or start with an a-z
  • after the first letter the variable names can contain any number of a-z _ or 0-9
  • variables names must be followed by an = sign
  • any number of spaces can be around the = sign

Example

Live Demo

https://regex101.com/r/aT6sC4/1

Sample text

name = "steve", bro = "4, hi = bye", lolwot = "wait wot"

Sample Matches

Note how capture group 1 only contains the variable names.

[0][0] = name
[0][1] = name

[1][0] = "steve"
[1][1] = 

[2][0] = bro
[2][1] = bro

[3][0] = "4, hi = bye"
[3][1] = 

[4][0] = lolwot
[4][1] = lolwot

[5][0] = "wait wot"
[5][1] = 

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  [^"]*                    any character except: '"' (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      _                        '_'
----------------------------------------------------------------------
      [a-z_0-9]                any character of: 'a' to 'z', '_', '0'
                               to '9'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    [a-z_0-9]+               any character of: 'a' to 'z', '_', '0'
                             to '9' (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
      =                        '='
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

7 Comments

I would like it to capture both the variable name and the variable value, like in the demo that I provided. I don't quite understand how your example works so I don't know where to put a second capturing group. Also, I need to match something like this to extract the value of 4... Thanks for helping <3
This demo captures both the variable name and it's assigned value.
I would change your regex myself if I understood it, but I notice the following problems: 1) It matches hi =, and 2) it doesn't match var1= var2 (just to be clear, I also need to match variable = variable, not only strings or numbers, and 3) it doesn't match num = 5.0. Thanks a lot for your help, I would never achieve that myself
Have a look at regex101.com/r/aT6sC4/2 I can't get this to match the floating b in the substring a=1, b, c=2 because then it would also match the clear, that appears later in your text. Also this will match any commented out values too. The expression is multiline to improve readability but can be condensed to a single line.
I don't really care if it matches arbitary text since I check the things around it, so it doesn't matter if it matches clear, for example. But I do care a lot about it matching the b in the text, so that's much preferred if it did match b and clear, as a side effect. Also, it might be a too complicated task for regex... So maybe it will need breaking into a couple smaller regexes...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.