3

So I am looking at parsing through a code using regular expressions and am wondering if there is an easier way to do it than what I have so far. I'll start with an example of a string I would be parsing through:

T16F161A286161990200040000\r (It's data coming through a serial device)

Now first I need to check the confirmation code, which are the first 9 characters of the code. They need to be exactly T16F161A2. If those 9 characters match exactly, I need to check the next 3 chracters which need to be either 861 or 37F.

If those 3 characters are 37F I have it do something I still need to code, so we won't worry about that result.

However if those 3 characters are 861 I need it to check the 2 characters after those and see what they are. They can be 11, 14, 60, 61, F0, F1, or F2. Each one of these does different things with the data preceeding it.

Finally I need to loop through the remaining characters, pairing each 2 of them together.

For an example of how this works, here is the code I've thrown together to parse through the example string I posted above:

import re

test_string = "T16F161A286161990200040000\r"

if re.match('^T16F161A2.*', test_string):
    print("Match: ", test_string)
    test_string = re.sub('^T16F161A2', '', test_string)
    if re.match('^861.*', test_string):
        print("Found '861': ", test_string)
        test_string = re.sub('^861', '', test_string)
        if re.match('^61.*', test_string):
            print("Found '61' : ", test_string)
            test_string = re.sub('^61', '', test_string)
            for i in range(6):
                if re.match('^[0-9A-F]{2}', test_string):
                    temp = re.match('^[0-9A-F]{2}', test_string).group()
                    print("Found Code: ", temp)
                test_string = re.sub('^[0-9A-F]{2}', '', test_string)

Now as you can see in this code, after every step I am using re.sub() to remove the part of the string I had just been looking for. With that in mind my question is the following:

Is there a way to parse the string and find the data I need, while also keeping the string intact? Would it be more or less efficient that what I currently have?

3
  • Why are you even using regex for this? Since you know exactly where to look and what variants there are, just use slicing and a few if/elif statements. Commented Jul 31, 2017 at 13:18
  • @tobias_k Unless I am mistaken, Python doesn't have a case/switch statement as part of it's language. Commented Jul 31, 2017 at 13:19
  • Whoops, wrong language. Anyway, just use a bunch of if/elif statements or a dict. Commented Jul 31, 2017 at 13:20

7 Answers 7

2

You don't need a regex for this task, you can use if/else blocks and a few string substitutions :

test_string = "T16F161A286161990200040000\r"

def process(input):
  # does a few stuff with 11, 14, 60, 61, F0, F1, or F2
  return

def stringToArray(input):
  return [tempToken[i:i+2] for i in range(0, len(tempToken), 2)]



if not test_string.startswith('T16F161A2'):
  print ("Does not match")
  quit()
else:
  print ("Does match")

tempToken = test_string[9:]

if tempToken.startswith('861'):
  process(tempToken) #does stuff with 11, 14, 60, 61, F0, F1, or F2
  tempToken = tempToken[5:]

  print (stringToArray(tempToken))
else:
  pass

You can see it live here

Sign up to request clarification or add additional context in comments.

2 Comments

Okay, one small question about what you've posted here. I need to not include the \r as part of my parsed data for starters, so would I just change the for loop in stringToArray to len(tempToken - 2)?
nop, change it to len(tempToken) - 1 because \r is only one char @Skitzafreak
0

I'd recommend (because you know the size of string) to instead first:

  • Check first 9 by comparing test_string[:9] == T16F161A2

I'd do this for the second phase too (test_string[9:12]). This comparison is much faster than regex actually.

When using a known size you can call your string as I did above. This won't "ruin" your string as you do now. I.e. re.search(pattern, test_string[9:12]).

Hope this helps you a bit at least. :)

Comments

0

Assuming the string is the same length everytime and the data is located in the same index you can just use the strings [] splicer. To get the first 9 characters you would use:test_string[:10] You could set them as variables and make it easier for checking:

confirmation_code = test_string[:10]
nextThree = test_string[10:13]
#check values

This is a built in method in python so it's safe to say its pretty efficient.

Comments

0

If you want to stick to regex then this can do:

pattern = re.compile(r'^T16F161A2((861)|37F)(?(2)(11|14|60|61|F0|F1|F2)|[0-9A-F]{2})([0-9A-F]{12})$')
match_result = pattern.match(test_string)

In this case you can check if match_result is a valid match object (if not, then there were no matching pattern). This match object will contain 4 elements: - first 3 grouping (861 or 37F) - useless data (ignore this) - 2 char code in case of first element is 861 (None otherwise) - last 12 digits

To split the last 12 digits a one liner:

last_12_digits = match_result[3]
last_digits = [last_12_digits[i:i+2] for i in range(0, len(last_12_digits), 2)]

Comments

0

You don't really need regular expressions for this, since you know exactly what you are looking for and where it should be found in the string, you can just use slicing and a couple of if/elif/else statements. Something like this:

s = test_string.strip()
code, x, y, rest = s[:9], s[9:12], s[12:14], [s[i:i+2] for i in range(14, len(s), 2)]
# T16F161A2, 861, 61, ['99', '02', '00', '04', '00', '00']

if code == "T16F161A2":
    if x == "37F":
    elif x == "861":
        if y == "11":
            ...
        if y == "61":
            # do stuff with rest
    else:
        # invalid
else:
    # invalid

Comments

0

Perhaps something like:

import re

regex = r'^T16F161A2(861|37f)(11|14|60|61|F0|F1|F2)(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})$'
string = 'T16F161A286161990200040000'

print re.match(regex,string).groups()

This makes use of capture groups and avoids having to create a bunch of new strings.

Comments

0

The re module will not be as efficient as direct substring access, but it could save you to write (and maintain) some lines of code. But if you want to use it, you should match the string as a whole:

import re

test_string = "T16F161A286161990200040000\r"

rx = re.compile(r'T16F161A2(?:(?:(37F)(.*))|(?:(861)(11|14|60|61|F0|F1|F2)(.*)))\r')
m = rx.match(test_string)      # => 5 groups, first 2 if 37F, last 3 if 861

if m is None:                  # string does not match:
    ...
elif m.group(1) is None:       # 861 type
    subtype = m.group(4)       # extract subtype
    # and group remaining characters by pairs
    elts = [ m.group(5)[i:i+2] for i in range(0, len(m.group(5)), 2) ]
    ...                        # process that
else:                          # 37F type
    ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.