0
//Last modified: Sat, Apr 16, 2011 09:55:04 AM
//Codeset: ISO-8859-1
fileInfo "version" "20x64";
createNode newnode -n "a_SET";
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" "blabla";
    setAttr -l on -k on ".test2" -type "string" "blablabla";
createNode newnode -n "b_SET";
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" "hmm";
    setAttr -l on -k on ".test2" -type "string" "ehmehm";

in Python:

I need to read the newnode names for instance "a_SET" and "b_SET" and their corresponding attribute values so {"a_SET": {"test1":"blabla", "test2":"blablabla"} and the same for the b_SET - there could be unknown amount of sets - like c_SET d_SET etc.

I've tried looping through lines and matching it there:

for line in fileopened:
    setmatch = re.match( r'^(createNode set -n ")(.*)(_SET)(.*)' , line)
     if setmatch:
            sets.append(setmatch.group(2))

and as soon as I find a match here I would loop through next lines to get the attributes (test1, test2) for that set until I find a new set - for instance c_SET or an EOF.

What would be the best way to grab all that info in one go with the re.MULTILINE?

3 Answers 3

3

You can use regexp positive lookahead to split the groups:

(yourGroupSeparator)(.*?)(?=yourGroupSeparator|\Z)

In your example:

import re

lines = open("e:/temp/test.txt").read()
matches = re.findall(r'createNode newnode \-n (\"._SET\");(.*?)(?=createNode|\Z)', lines, re.MULTILINE + re.DOTALL);

for m in matches:
    print "%s:" % m[0], m[1]


"""
Result:
>>>
"a_SET":
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" "blabla";
    setAttr -l on -k on ".test2" -type "string" "blablabla";

"b_SET":
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" "hmm";
    setAttr -l on -k on ".test2" -type "string" "ehmehm";
"""

If you want the results on a dict, you can use:

result = {}
for k, v in matches:
    result[k] = v   # or maybe v.split() or v.split(";")

after findall

Sign up to request clarification or add additional context in comments.

4 Comments

thanks - I like the approach. Any way of catching that into a dictionary? {'a_SET': [lines], 'b_SET':[nextlines]} etc.?
If there is no '^' nor '&' in the RE, it's useless to put the flag re.MULTILINE
@eyquem: "Explicit is better than implicit" ;)
When something is "implicit" , that means that though invisible it is useful. Does "explicit" mean that though visible it may be useless ?
0

I got this:

import re

filename = 'tr.txt'

with open(filename,'r') as f:
    ch = f.read()

pat = re.compile('createNode newnode -n ("\w+?_SET");(.*?)(?=createNode|\Z)',re.DOTALL)
pit = re.compile('^ *setAttr.+?("[^"\n]+").+("[^"\n]+");(?:\n|\Z)',re.MULTILINE)

dic = dict( (mat.group(1),dict(pit.findall(mat.group(2)))) for mat in pat.finditer(ch)) 
print dic

result

{'"b_SET"': {'".test2"': '"ehmehm"', '".test1"': '"hmm"'}, '"a_SET"': {'".test2"': '"blablabla"', '".test1"': '"blabla"'}}

.

Question:

what if there must be character '"' in the strings ? How is it represented ?

.

EDIT

I had some difficulty to find the solution because I didn't choose the facility.

Here's a new pattern that catches the FIRST string "..." and the LAST string "..." present after a string " setAttr" and before the next " setAttr". So several "..." can be present , not only 3. You didn't asked this condition, but I thought it may happen to be needed.

I also managed to make possible the presence of newlines in the strings to catch "....\n......" , not only around them. For that , I was obliged to invent something new for me: (?:\n(?! *setAttr)|[^"\n]) that means : all characters, except '"' and common newlines \n , are accepted and also only the newlines that are not followed by a line beginning with ' *setAttr'

For (?:\n(?! *setAttr)|.) it means : newlines not followed by a line beginning with ' *setAttr' and all the other non-newline characters.

Hence, any other special sequence as tab or whatever else are automatically accpted in the matchings.

ch = '''//Last modified: Sat, Apr 16, 2011 09:55:04 AM
//Codeset: ISO-8859-1
fileInfo "version" "20x64";
createNode newnode -n "a_SET";
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" "blabla";
    setAttr -l on -k on ".test2" -type "string" "blablabla";
createNode newnode -n "b_SET";
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" (
      "hmm bl
      abla\tbla" );
    setAttr -l on -k on ".tes\nt\t2" -type "string" "ehm\tehm";
    setAttr -l on -k on ".test3" -type "string" "too
    much" "pff" """ "feretini" "gol\nolo";
    '''

import re

pat = re.compile('createNode newnode -n ("\w+?_SET");(.*?)(?=createNode|\Z)',re.DOTALL)
pot = re.compile('^ *setAttr.+?'
                 '"((?:\n(?! *setAttr)|[^"\n])+)"'
                 '(?:\n(?! *setAttr)|.)+'
                 '"((?:\n(?! *setAttr)|[^"\n])+)"'
                 '.*;(?:\n|\Z)',re.MULTILINE)

dic = dict( (mat.group(1),dict(pot.findall(mat.group(2)))) for mat in pat.finditer(ch)) 
for x in dic:
    print x,'\n',dic[x],'\n'

result

"b_SET" 
{'.test3': 'gol\nolo', '.test1': 'hmm bl\n      abla\tbla', '.tes\nt\t2': 'ehm\tehm'} 

"a_SET" 
{'.test1': 'blabla', '.test2': 'blablabla'}

1 Comment

I like the approach - it's working well. I've posted another option down there - how would you go around that possibility (when the line is broken with a '\n' at the beginning of the string? thanks!
0

Another possible option:

createNode newnode -n "b_SET";
    addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
    setAttr -l on -k off ".tx";
    setAttr -l on -k off ".ty";
    setAttr -l on -k off ".sz";
    setAttr -l on -k on ".test1" -type "string" (
      "hmm blablabla" );
    setAttr -l on -k on ".test2" -type "string" "ehmehm";

So as you can see ".test1" value is now split with a /n line separator. How would you go around that using eyquem's approach?

pit = re.compile('^ *setAttr.+?("[^"\n]+").+("[^"\n]+");(?:\n|\Z)',re.MULTILINE)

2 Comments

@eyquem additionally there could be a /t tab separator there as well so the value of the key "test1" would become "hmm blablabla" but it's now separated from the line setAttr -l on -k on ".test1" -type "string" by a \n and possibly one or two \t tabs
@eyquem Thanks for quick response! Appreciate! I've tried that and it works great on the example above but fails on the one posted below - do you know what am I still missing here? A bit lost at the moment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.