I have a large number of strings on the format YYYYYYYYXXXXXXXXZZZZZZZZ, where X, Y, and Z are numbers of fix length, eight digits. Now, the problem is that I need to parse out the middle sequence of integers and remove any leading zeroes. Unfortunately is the only way to determine where each of the three sequences begins/ends is to count the number of digits.
I am currently doing it in two steps, i.e:
m = re.match(
r"(?P<first_sequence>\d{8})"
r"(?P<second_sequence>\d{8})"
r"(?P<third_sequence>\d{8})",
string)
second_secquence = m.group(2)
second_secquence.lstrip(0)
Which does work, and gives me the right results, e.g.:
112233441234567855667788 --> 12345678
112233440012345655667788 --> 123456
112233001234567855667788 --> 12345678
112233000012345655667788 --> 123456
But is there a better method? Is is possible to write a single regex expression which matches against the second sequence, sans the leading zeros?
I guess I am looking for a regex which does the following:
- Skips over the first eight digits.
- Skips any leading zeros.
- Captures anything after that, up to the point where there's sixteen characters behind/eight infront.
The above solution does work, as mentioned, so the purpose of this problem is more to improve my knowledge of regex. I appreciate any pointers.
string[8:16].lstrip('0').\d{8}0*(\d*)\d{8}regex101.com/r/1HjS5m/1