4

I need to get some data from PHP(Wordpress) config files from my Python script. How I can parse config data? For example, how I can get $wp_version value? Config example:

/**
 * The WordPress version string
 *
 * @global string $wp_version
 */
$wp_version = '3.5.1';

/**
 * Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.
 *
 * @global int $wp_db_version
 */
$wp_db_version = 22441;

/**
 * Holds the TinyMCE version
 *
 * @global string $tinymce_version
 */
$tinymce_version = '358-23224';

/**
 * Holds the required PHP version
 *
 * @global string $required_php_version
 */
$required_php_version = '5.2.4';

/**
 * Holds the required MySQL version
 *
 * @global string $required_mysql_version
 */
$required_mysql_version = '5.0';

$wp_local_package = 'en_EN';
2
  • If you have access to PHP, it maybe more robust to use PHP to tokenise the source file and output the structure in a more Python-friendly format - using token_get_all for example. Commented Jun 2, 2013 at 10:37
  • try github.com/ramen/phply Commented Jun 2, 2013 at 10:39

2 Answers 2

6

You know that a simple variable in PHP is like $foo = 'bar';, let's create a regex that does not take in account something like $_GET or $foo['bar']:

  1. Start with $, note that we need to escape it:
    \$
  2. The first character after $ can't be a number and has to be a letter or underscore:
    \$[a-z]
  3. Then there may be a letter or digits or underscore after it:
    \$[a-z]\w*
  4. Let's put the parenthesis:
    \$([a-z]\w*)
  5. Now then there should be the "equal sign", but to make it more compatible, let's make the spaces optional:
    \$([a-z]\w*)\s*=\s*
  6. After this there should be a value and it ends with a ;:
    \$([a-z]\w*)\s*=\s*(.*?);$
  7. We will use the m modifier which make ^$ match start and end of line respectively.
  8. You can then use a trimming function to get ride of the single and double quotes.

Online demo

Note 1: This regex will fail at nested variables $fail = 'en_EN'; $fail2 = 'en_EN';
Note 2: Don't forget to use the i modifier to make it case insensitive.

Sign up to request clarification or add additional context in comments.

1 Comment

Your regex is working pretty good! But I add quotes and remove new line symbol from end: \$([a-z]\w*)\s*=\s*\'(.*?)\';
2

I've written a little python script to get pull database login information from wordpress's wp-config.php file for doing automatic site backups.

Here is the relevant part of my code (GitHub's syntax highlighting has trouble with Python's triple quoted strings):

#!/usr/bin/env python3
import re

define_pattern = re.compile(r"""\bdefine\(\s*('|")(.*)\1\s*,\s*('|")(.*)\3\)\s*;""")
assign_pattern = re.compile(r"""(^|;)\s*\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*=\s*('|")(.*)\3\s*;""")

php_vars = {}
for line in open("wp-config.php"):
  for match in define_pattern.finditer(line):
    php_vars[match.group(2)]=match.group(4)
  for match in assign_pattern.finditer(line):
    php_vars[match.group(2)]=match.group(4)

1 Comment

That's awesome!! Thanks for sharing. One thing it doesn't parse are defines with boolean or int values like define('MULTISITE', true);. I fixed the regex for this with ('|"?) but then had to strip extra spaces on the resulting value...like when it is entered as define( 'MULTISITE', true );

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.