3

I want to match a pattern with a string including pure numbers, such as '2324235235980980' with a pattern like as described below:

The pattern is '2-6-8-7-4', in which the pattern starts with 2, transit to 6, either self-loop at 6 or transit to 8, then it could go back and forth between 6 and 8, could self-loop at 8, or could transit to 7. And the same thing for 7. One more thing for 7 is 7-8-6-8-7 could happen. Finally, 7 could reach 4, once it reaches 4, the pattern is done. During the process, if it reaches out to other points, then it has to start with 2 again to be counted. I use

import re    
re.findall(r'(2((6+8+)+)7)', test_string)

the output includes '2666686888668887', but when I add 4, I don't know the syntax to compile this. Has anyone an idea? Thanks a lot!

7
  • When you say "then it could go back and forth between 6 and 8, could self-loop at 8, or could transit to 7" does that mean it could go 6-8-6-7 or does it need to go back to 8 before going to 7? Commented Nov 15, 2017 at 1:02
  • Also, when you say "the same thing for 7" what exactly are you referring to? 8's property of going back and forth between 6 and itself? It would be helpful if you provided a list of valid and invalid strings that outline as many of the edge cases as possible. Commented Nov 15, 2017 at 1:06
  • Why the [perl] tag if it's a python question? Commented Nov 15, 2017 at 2:00
  • My bad for not clarifying the question. When you reach 8, you could transit to 6, or stay at 8, or transit to 7. "The same thing for 7" is when you reach 7, you could transit back to 8, or stay at 7, or transit forward to 4. Samples of valid strings are like '266868874', '268787866668887774', '268688788774'. Invalid strings are like '263874', '268734', '2688668778868714', etc... Commented Nov 15, 2017 at 3:29
  • Think 2-6-8-7-4 as a tunnel, once you enter from 2, in each time period, you move one step in the tunnel. You don't break out the tunnel in the middle, while you could stay at any intermediate point for any amount of time. You could also back and forth among the intermediate points. Once you reach the end point 4, you are out of the tunnel. Commented Nov 15, 2017 at 3:29

2 Answers 2

1

I think this is easier achieved than initially expected:

26[68]+?[687]+?4

2-followed-by-6-followed-by-6|8-followed-by-6|8|7-followed-by-4.

The only not so obvious part is to make the pattern lazy.

Here is an even better pattern:

\b26?([^7]6|8|[^6]7)+?4\b

2-followed-by-(not7)6|8|(not6)7-followed-by-4.

Sign up to request clarification or add additional context in comments.

2 Comments

This is brilliant! Perfectly solve my problem! Thank you!
Wait, there is a problem. When I test string' 26876874', it matches what it should not match. You CAN NOT transit from 7 to 6 in one step. Only '7-8-6-8-7-4' would happen, not '7-6-8-7-4'. Do you have an idea to fix this?
0

I don't know if I understand what you need, but maybe this can work for you:

string = "2666686888668887748926874"
index = [(m.start(0), m.end(0)) for m in re.finditer(r'2(6+8+)+7+\1?4', string)]
print(index)

Prints: [(0, 18), (20, 25)].

Is a list of tuples with the start and end index for every occurrence.

1 Comment

This may be helpful to my further exploration. Thanks for your work!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.