1

I'm trying to extract functions and function headers from some source code files. Here's an example of the type of code:

################################################################################
# test module
#
# Description : Test module
#
DATABASE test

###
# Global Vars
GLOBALS
    DEFINE G_test_string    STRING
END GLOBALS

###
# Modular Vars
DEFINE M_counter            INTEGER

###
# Constants
CONSTANT MAX_ARR_SIZE = 100

##################################
# Alternative header
##################################
FUNCTION test_function_1()
    DEFINE  F_x     INTEGER

    LET F_x = 1

    RETURN F_x
END FUNCTION

###################################
# Function:
#   This is a test function
#
# Parameters:
#   in - test
#
# Returns:
#   out - result
#
FUNCTION test_function_2( P_in_var )
    DEFINE  P_in_var    INTEGER

    DEFINE  F_out_var   INTEGER


    LET F_out_var = P_in_var

    RETURN F_out_var
END FUNCTION

FUNCTION test_init_array()
    DEFINE  F_array     ARRAY[ MAX_ARR_SIZE ] OF INTEGER
    DEFINE  F_element   INTEGER

    FOR F_element = 1 TO MAX_ARR_SIZE

        LET F_array[ F_element ] = F_element * F_element

    END FOR

END FUNCTION

Functions may or may not have a header above them. I'm trying to capture the function source, function header, function name and any parameters passed into the function in groups. Here's the expression i came up with (i'm doing this using .Net regex and have been testing using Regex Hero):

^([#]{0,1}.*?)(FUNCTION\s+(.*?)[(](.*?)[)].*?END FUNCTION) 

This seems to work ok for all but the first function (test_function_1) in the file. The initial grouping for test_function_1 is capturing everything from the first line (the top of the source file) until the FUNCTION of test_function_1 begins. I realise this is because there are #s for other comments in the file, but i only want to capture the function header.

2
  • @cHao - You're not far wrong, it's Genero's version of Informix 4gl. Commented Jul 12, 2010 at 10:49
  • 2
    Have you tried a real parser rather than just a single regex? Commented Jul 12, 2010 at 10:50

1 Answer 1

1

If I see it correctly, you have problems identifying lines starting with #. To achieve this, you could turn on the RegexOptions.Multiline flag and match the function header with

((?:^#.*\s)*)

Edit: For this to work, you'd have to switch OFF RegexOptions.Singleline and replace .*? with [\s\S]*? in your function body part.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.