1

I have a .lua file where there are stored tables in this format:

["f@someFaction - someServer@guildVaults"] = {
    ["someStr1"] = {
        ["someStr2"] = 7,
        ["someStr3"] = 2
    }
    ["someStr4"] = {
        ["someStr5"] = 7,
        ["someStr6"] = 2
    }
}

Basically there can be any number of nested tables. I know the names of the initial table that I'm looking to extract, however, I have trouble extracting the subsequent table.

with open("somePath", "rb") as file:
    f = file.read()

pattern = r"\[\"f@[a-zA-z]+ - [a-zA-z]+@guildVaults\"\] = \{[ \t\n]*"
guildVaults = re.findall(pattern, f)

for guild in guildVaults:
    print guild

Results:

["f@Alliance - Thunderhorn@guildVaults"] = {
["f@Alliance - Proudmoore@guildVaults"] = {
["f@Alliance - Kazzak@guildVaults"] = {
["f@Horde - Draenor@guildVaults"] = {

Any suggestions?

Edit: example of the .lua file here: http://www.pastefile.com/Tx2LVD

2 Answers 2

1

You need to set the appropriate flags. Also, I would extract everything until a single { is present in a line (assuming all of your tables are similarly formatted):

pattern = r"\[\"f@[a-zA-z]+ - [a-zA-z]+@guildVaults\"\] = ({.*?^}$)"
guildVaults = re.findall(pattern, data, re.MULTILINE | re.DOTALL)

for guild in guildVaults:
    print(guild)

For the provided input data, it prints:

{
    ["someStr1"] = {
        ["someStr2"] = 7,
        ["someStr3"] = 2
    }
    ["someStr4"] = {
        ["someStr5"] = 7,
        ["someStr6"] = 2
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

unfortunately that yields an empty result. Here is a screenshot on how it is formatted: i.gyazo.com/8af990c1cb1711fc40db4c7a1adb74fe.png. Here is another screenshot with the first guild hidden: i.gyazo.com/3353d4d40e44e547be7b45a882373b04.png
I've edited the OP and have put a direct sample data there (.lua file)
0

Maybe you want to convert the lua to python, then exec the result and get native python objects.

  1. Detect the top-level lines: ["f@someFaction - someServer@guildVaults"] = {

and extract all text until the ending } .

  1. In this text, remove all square brackets, replace all "}" to "}," and = to :.

  2. Prepend some variable name for the result, for example foo = { and add } at end.

You will get:

foo = {
    "someStr1" : {
        "someStr2" : 7,
        "someStr3" : 2
    },
    "someStr4" : {
        "someStr5" : 7,
        "someStr6" : 2
    }
}

Now, this can be manipulated in Python.

5 Comments

Indeed, that is the end goal. Problem is the 'extract all text until ending }' part.
Apply some heuristic :) LIke, } alone in postion 1. Or in the hard way, count opening and closing curly braces.
Well the reason why I want to use pattern matching is because it should be possible to extract the text here: ["f@someFaction - someServer@guildVaults"] = { ... text to extract ... }. However, I was unable to make it work and even with @alecxe's solution, it still does not work - hence I provided an actual file to do some testing on.
Maybe this can be done with some advanced regexp fu. Perhaps I'd combine both things: simply loop thru the file + simple pattern matching for searching the 1st and last line of each chunk/variable. And be done with it.
I've realised that the mistake was in using 'rb' instead of 'r' when opening the file. It works according to alecxe's suggestion now (with some minor tweaks).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.