Remove leading zeros in middle of string with regex

Question

I have a large number of strings on the format YYYYYYYYXXXXXXXXZZZZZZZZ, where X, Y, and Z are numbers of fix length, eight digits. Now, the problem is that I need to parse out the middle sequence of integers and remove any leading zeroes. Unfortunately is the only way to determine where each of the three sequences begins/ends is to count the number of digits.

I am currently doing it in two steps, i.e:

m = re.match(
    r"(?P<first_sequence>\d{8})"
    r"(?P<second_sequence>\d{8})"
    r"(?P<third_sequence>\d{8})",
    string)
second_secquence = m.group(2)
second_secquence.lstrip(0)

Which does work, and gives me the right results, e.g.:

112233441234567855667788 --> 12345678
112233440012345655667788 --> 123456
112233001234567855667788 --> 12345678
112233000012345655667788 --> 123456

But is there a better method? Is is possible to write a single regex expression which matches against the second sequence, sans the leading zeros?

I guess I am looking for a regex which does the following:

Skips over the first eight digits.
Skips any leading zeros.
Captures anything after that, up to the point where there's sixteen characters behind/eight infront.

The above solution does work, as mentioned, so the purpose of this problem is more to improve my knowledge of regex. I appreciate any pointers.

Do you need regexes here? string[8:16].lstrip('0').

Iluvatar
– Iluvatar

2016-12-07 13:53:24 +00:00
Commented Dec 7, 2016 at 13:53 — Iluvatar
– Iluvatar, Commented Dec 7, 2016 at 13:53
\d{8}0*(\d*)\d{8} regex101.com/r/1HjS5m/1

Patrick Haugh
– Patrick Haugh

2016-12-07 13:57:49 +00:00
Commented Dec 7, 2016 at 13:57 — Patrick Haugh
– Patrick Haugh, Commented Dec 7, 2016 at 13:57

Tomalak · Accepted Answer · 2016-12-07 13:54:43Z

4

This is a typical case of "useless use of regular expressions".

Your strings are fixed-length. Just cut them at the appropriate positions.

s = "112233440012345655667788"
int(s[8:16])
# -> 123456

answered Dec 7, 2016 at 13:54

Tomalak

339k68 gold badges547 silver badges635 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

lucasnadalutti · Accepted Answer · 2016-12-07 13:53:48Z

3

I think it's simpler not to use regex.

result = my_str[8:16].lstrip('0')

answered Dec 7, 2016 at 13:53

lucasnadalutti

5,9882 gold badges30 silver badges49 bronze badges

Comments

JPEG_ · Accepted Answer · 2016-12-07 13:58:42Z

2

Agree with the other answers here that regex isn't really required. If you really want to use regex, then \d{8}0*(\d*)\d{8} should do it.

answered Dec 7, 2016 at 13:58

JPEG_

3211 gold badge3 silver badges11 bronze badges

Comments

SierraOscar · Accepted Answer · 2016-12-07 14:01:35Z

1

Just to show that it is possible with regex:

https://regex101.com/r/8RSxaH/2

# CODE AUTO GENERATED BY REGEX101.COM (SEE LINK ABOVE)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\d{8})((?:0*)(\d{,8}))(?=\d{8})"

test_str = ("112233441234567855667788\n"
    "112233440012345655667788\n"
    "112233001234567855667788\n"
    "112233000012345655667788")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Although you don't really need it to do what you're asking

answered Dec 7, 2016 at 14:01

SierraOscar

17.7k6 gold badges44 silver badges71 bronze badges

1 Comment

user7262455 Over a year ago

Thank you. Thats exactly the kind of expression I was looking for. Excellent website, thanks for bring it to my attention as well. Cheers!

Collectives™ on Stack Overflow

Remove leading zeros in middle of string with regex

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related