0

I'm creating a console application to scan classic asp pages and do a match on the asp function getcontentdirect("") The code at the bottom is an example of some random snippets of code that I am scanning.

I'm using the following regex - (?<=getContentDirect)([(][^)]*)[)].

I need 2 regex's to find the functions that

a - Finds just : ("Content User") so no ampersands or commas. (the word Content User is a word to be searched in the database and could be anyword)

b - Finds anything that is not that as above so ("OLCINTRO " & obj_Session.GetDetail("CurrentCurriculumID", "0") & "/" & str_Action)

I need to be able to check if the function contains variables or other functions. My above regex finds both.

I'm using http://regexhero.net/tester/ to test my regex.

00165: Response.Write "<div style=""margin-left:2.5%""><span class=""Content1"">" & obj_Content.getContentDirect("Content User") & _
00247: <td><%=obj_Content.GetContentDirect("OLCINTRO " & obj_Session.GetDetail("CurrentCurriculumID", "0") & "/" & str_Action)%></td>
<td><%=obj_Content.GetContentDirect("OLCINTRO " & obj_Session.GetDetail("CurrentCurriculumID", "0") & "/" & str_Action)%></td>
if len(LgSelect(25))=0 then LgSelect(25)= obj_Content.getContentDirect("CONTENT SelectRatee")

2 Answers 2

1

I don't know if it's possible to detect that in a single regex, but it's possible to process the string in stages something like this:

Capture up to one level of nested parentheses in group 1:

(?<=(?i)getContentDirect)\([^()]*(?:\([^()]*\))*[^()]*\)

Remove any string literals:

"[^"]*"

Search for a letter which is part of a variable or function name:

[A-Za-z]
Sign up to request clarification or add additional context in comments.

Comments

1
  1. Is this what you meant?

    var re = new Regex(@"getContentDirect\( *\"Content User\" *\)");
    
  2. Since you're using C# and .NET you can take advantage of its balancing groups.

    var re = new Regex(@"getContentDirect\((
        (?:               
        [^()]             # Match non-brackets
        |
        (?<BR> \( )       # Match '(', and capture into 'BR'
        |
        (?<-BR> \) )      # Match ')', and delete the 'BR' capture
        )+
        (?(BR)(?!))       # Fails if 'BR' stack isn't empty!
    )\)"; RegexOptions.IgnoreWhitespace);
    

This captures the insides in \1. It basically goes through the contents (...) of the getContentDirect(...), and if it encounters a non-bracket it captures it, if it encounters an opening bracket it takes note of it and increases the BR counter (think of it as a count of how many opening brackets we've found), and if it encounters a closing bracket, it decreases the BR counter.

The (?(BR)(?!) says "don't match unless the the BR counter is 0", i.e. the number of opening brackets we've seen equals the number of closing brackets.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.