-1

I have a function which parses PHP array declarations from files. The function then returns a dictionary with the keys being the keys of the PHP array and the values in python are the values from the PHP array.

Example file:

$lang['identifier_a'] = 'Welcome message';
$lang['identifier_b'] = 'Welcome message.
You can do things a,b, and c here.

Please be patient.';
$lang['identifier_c'] = 'Welcome message2.
You can do things a,b, and c here.
Please be patient.';
$lang['identifier_d'] = 'Long General Terms and Conditions with more text';
$lang['identifier_e'] = 'General Terms and Conditions';
$lang['identifier_f'] = 'Text e';

Python function

def fetch_lang_keys(filename):
    from re import search;
    import mmap;

    ''' fetches all the language keys for filename '''
    with open(filename) as fi:
        lines = fi.readlines();

    data = {};
    for line in lines:
        obj = search("\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];", line);
#        re.match(r'''\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];''', re.MULTILINE | re.VERBOSE);

        if obj:
            data[obj.group(1)] = obj.group(2);

    return data;

This function should return a dictionary which should look like this:

data['identifier_a'] = 'Welcome message'
data['identifier_b'] = 'Welcome message.
You can do things a,b, and c here.

Please be patient.';
// and so on

The regexp which is used in the function works for everything except for identifier_b and identifier_c, because the regular expression does not match blank lines and/or lines which do not end with ;. The wildcard operator with ; at the end did work either, because it matched too much.

Do you have any idea of how to solve this? I looked into lookahead assertions, but failed to use them properly. Thanks.

1

3 Answers 3

2

Well, why my answer is not a solution for your regexp problem, but nevertheless: why don't you wish to use a "real PHP parser" instead of home-brew regexp's? It could be much more reliable and might even be faster, and certainly a more maintainable solution.

Quick googling gave me: https://github.com/ramen/phply . But also I've found this: Parse PHP file variables from Python script . Hope this help.

Sign up to request clarification or add additional context in comments.

2 Comments

phply unfortunately does not work as it does not parse the multi line php strings correctly.
well, its unit-tests have testcases for multiline strings, but probably they don't cover your case. I'll check.
1

This regex seems to work. -

\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"]((?:.|\n)+?)[\'|\"];
                                          ^^^^^^^^^^

Demo here-

2 Comments

How would i allow this character '|' in the values too? (Without quotation marks)
Another special case occured which i am not to sure about: The value of the PHP array contained '...to Member\'s value...', which was not matched. Any idea?
1

It doesn't work because the dot doesn't match newlines. You must use the singleline modifier (re.DOTALL) instead of the multiline modifier. Example:

obj = re.search(r'\$lang\[[\'"](.+?)[\'"]\] = [\'"](.+?)[\'"];', line, re.DOTALL);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.