1

Given a python list split values based on certain criteria:

    list = ['(( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))',
    '(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
     (value(PRICELIST) in propval(valid))'] 

Now list[0] would be

  (( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))

I want to split such that upon iterating it would give me:

#expected output
value(sam) = literal(abc)
value(like) = literal(music)

That too if it starts with value and literal. At first I thought of splitting with and ,or but it won't work as sometimes there could be missing and ,or.

I tried :

for i in list:
i.split()
print(i)
#output ['((', 'value(abc)', '=', 'literal(12)', 'or' .... 

I am open to suggestions based on regex also. But I have little idea about it I prefer not to include it

4
  • You might want to look into parsing, e.g. PEG here. Don't you mess with regular expressions. Additionally, where do these strings come from? Maybe just go upriver. Commented Mar 12, 2019 at 11:49
  • I had a xml from which I made an AST from it to get this desired string . But at the end I wan't given list as the output just with few values replaced like I mentioned. Commented Mar 12, 2019 at 12:13
  • @Jan can you help me with PEG. I am trying to use parsimonious here Commented Mar 12, 2019 at 12:41
  • Added an example, see the answer below. Commented Mar 12, 2019 at 19:09

4 Answers 4

1

So to avoid so much clutter, I'm going to explain the solution in this comment. I hope that's okay.

Given your comment above which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

Explanation:

  • Pre-Note - I changed [a-zA-Z0-9\._]+ to [\w\.]+ because they mean essentially the same thing but one is more concise. I explain what characters are covered by those queries in the next step
  • With ([\w\.]+, noting that it is "unclosed" meaning I am priming the regex to capture everything in the following query, I am telling it to begin by capturing all characters that are in the range a-z, A-Z, and _, and an escaped period (.)
  • With (?:\() I am saying the captured query should contain an escaped "opening" parenthesis (()
  • With [\w\.]+(?:\)) I'm saying follow that parenthesie again with the word characters outlined in the second step, but this time through (?:\)) I'm saying follow them with an escaped "closing" parenthesis ())
  • This [\s=<>(?:in)]+ is kind of reckless but for the sake of readability and assuming that your strings will remain relatively consistent this says, that the "closing parenthesis" should be followed by "whitespace", a =, a <, a >, or the word in, in any order however many times they all occur consistently. It is reckless because it will also match things like << <, = in > =, etc. To make it more specific could easily result in a loss of captures though
  • With [\w\.]+(?:\()[\w\.]+(?:\)) I'm saying once again, find the word characters from step 1, followed by an "opening parenthesis," followed again by the word characters, followed by a "closing parenthesis"
  • With the ) I am closing the "unclosed" capture group (remember the first capture group above started as "unclosed"), to tell the regex engine to capture the entire query I have outlined

Hope this helps

Sign up to request clarification or add additional context in comments.

1 Comment

LOL. I'm just hoping newcomers can understand it too. Glad it helped, man.
1

@Duck_dragon

Your strings in your list in the opening post were formatted in such a way that they cause a syntax error in Python. In the example I give below, I edited it to use '''

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Simple findall without setting it equal to a variable so it returns a list of separate strings but which you can't use
#You can also use the *MORE SIMPLE* but less flexible regex:  '([a-zA-Z]+\([a-zA-Z]+\)[\s=]+[a-zA-Z]+\([a-zA-Z]+\))'
>>> for item in list:
        re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item)    

    ['value(name) = literal(luke)', 'value(like) = literal(music)']
    ['value(sam) = literal(abc)', 'value(like) = literal(music)']

.

To take this a step further and give you an array you can work with:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(sam) = literal(abc)', 'value(like) = literal(music)']

.

Given your comment below which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

.

Edit: Or is this what you want?

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=<>(?:in)]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)']

Let me know if you need an explanation.

.

@Fyodor Kutsepin

In your example take out your_list_ and replace it with OP's list to avoid confusion. Secondly, your for loop lacks a : producing syntax errors

7 Comments

Thanks @FailSafe. Can you help me explain the regex. In this case it is working fine and ignoring this (value(PRICELIST) in propval(valid)) but in my original list there are values such as (value(PICK_SKU1) = propval(._sku),',propval(._amEntitled) > literal(0))' etc. but I only intend to find literal and value. maybe @Jan suggestion of using PEG would be helpful if regex can't solve this. But I appreciate whatever you did.
okay. there are so many cases that we need to put many conditions in regex if thats how it works. can you make changes to your regex to include <>=in operators. After that I will figure out what to do with it.
I made an edit. I'm not sure what exactly you need, but I hope it helps.
Thats exactly what I intend to find. Can you explain it on a higher level.
I changed it again, but I will explain in a moment. It's gonna take me a bit of time to get the explanation together.
|
0

First, I would suggest you to avoid of naming your variables like build-in functions. Second, you don't need a regex if you want to get the mentioned output.

for example:

first, rest = your_list_[1].split(') and'):
for item in first[2:].split('or')
    print(item)

1 Comment

You're missing a : after the for loop
0

Not saying you should but you definately could use a PEG parser here:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = ['(( value(name) = literal(luke) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))',
        '(( value(sam) = literal(abc) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))']

grammar = Grammar(
    r"""
    expr        = term (operator term)*
    term        = lpar* factor (operator needle)* rpar*
    factor      = needle operator needle

    needle      = word lpar word rpar

    operator    = ws? ("=" / "or" / "and" / "in") ws?
    word        = ~"\w+"

    lpar        = "(" ws?
    rpar        = ws? ")"
    ws          = ~r"\s*"
    """
)

class HorribleStuff(NodeVisitor):
    def generic_visit(self, node, visited_children):
        return node.text or visited_children

    def visit_factor(self, node, children):
        output, equal = [], False

        for child in node.children:
            if (child.expr.name == 'needle'):
                output.append(child.text)
            elif (child.expr.name == 'operator' and child.text.strip() == '='):
                equal = True

        if equal:
            print(output)

for d in data:
    tree = grammar.parse(d)
    hs = HorribleStuff()
    hs.visit(tree)

This yields

['value(name)', 'literal(luke)']
['value(sam)', 'literal(abc)']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.