2

I have a string that includes an arbitrary number of pairs:

A=B, C=D, E=F

This is an options string, so I know "A", "C", and "E". I can query for them if I want.

I want to find malformed pairs in the string:

A=B, C, E=F  # C has no equals or value
A=, C=D, E=F # A has no value
A=B, C=D, E=F X # what is X doing there!

Of course, A, C, and E are all optional and can appear in any order.

What is the elegant way to grab all the pairs, while noticing an error condition? I am able to grab pairs now using re.findall(...), but I fail in the 3rd case above.

Here's what I have. In my exact case, the right side of the pair must be quoted but that's not important for this question.

re.findall('\s*(\w+)\s*=\s*(?P<Q>[\'\"])(\w*)(P=Q)\s*,{0,1}', a_string)

If I knew that a_string was entirely consumed, I'd be a happy guy.

3 Answers 3

5

Split and print the strings which is not in the pattern like A=B.

>>> def malformed(s):
    return [i for i in s.split(', ') if not re.search(r'^[A-Z]+=[A-Z]+$', i)]

>>> print(malformed('A=, C=D, E=F'))
['A=']
>>> print(malformed('A=B, C=D, E=F X'))
['E=F X']
>>> print(malformed('A=B, C, E=F'))
['C']
Sign up to request clarification or add additional context in comments.

2 Comments

Interesting, hitting it from the other direction.
By finding the wrong expressions first, you allow a straight-forward regexp find the valid pairs. It is good.
2

How about splitting it into two much easier to read tests?

import re

tests = ['A=B, C, E=F'
        ,'A=, C=D, E=F'
        ,'A=B, C=D, E=F X'
        ,'A=B, C=D']


for test in tests:

    print "*", test


    if not re.match("^(\w+=\w+, )*(\w+=\w+)$", test):
        print "Options are malformed"

    options = re.findall("\w+=\w+", test)


    print "Read: ", options
    print

Example output:

* A=B, C, E=F
Options are malformed
Read:  ['A=B', 'E=F']

* A=, C=D, E=F
Options are malformed
Read:  ['C=D', 'E=F']

* A=B, C=D, E=F X
Options are malformed
Read:  ['A=B', 'C=D', 'E=F']

* A=B, C=D
Read:  ['A=B', 'C=D']

Comments

1

Another approuch would be to try to directly match pairs that don't fit with regex, like:

(?<=,\s|^)(?!\s*\w+=\w+(?=,|$))([^,\n]+)

DEMO

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.