4

I'm trying to split the string:

> s = Ladegårdsvej 8B7100 Vejle

with a regex into:

[street,zip,city] = ["Ladegårdsvej 8B", "7100", "Vejle"]

s varies a lot, the only certain part is that there are always 4 digits in the zip and a whitespace afterwards. My idea is thus to "match from the right" on 4 digits and a whitespace to indicate that the string should be split at that point in the string.

Currently I'm able to get street and city like this:

> print re.split(re.compile(r"[0-9]{4}\s"), s)
["Ladegårdsvej 8B", "Vejle"]

How would I go about splitting s as desired; in particular, how to do it in the middle of the string between the number in street and zip?

4
  • Would all strings have the same overall format as that string, cause then you could just split it on whitespace cause that seems to be a delimiter between the three Commented Jul 24, 2017 at 12:21
  • 1
    @Professor_Joykill: There is no whitespace between street & zip. Commented Jul 24, 2017 at 12:22
  • 1
    @Professor_Joykill please note that OP wants to put 7100 rather than 8B7100 into zip. Commented Jul 24, 2017 at 12:22
  • 2
    See ideone.com/dmyo6b, you may match and capture the parts. Commented Jul 24, 2017 at 12:23

3 Answers 3

9

You can use re.split, but make the four digits a capturing group:

>>> s = "Ladegårdsvej 8B7100 Vejle"
>>> re.split(r"(\d{4}) ", s)
['Ladegårdsvej 8B', '7100', 'Vejle']

From the documentation (emphasis mine)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.

Sign up to request clarification or add additional context in comments.

Comments

1

Once you have street, getting zip is trivial:

zip = s[len(street):len(street)+4]

Comments

0

Here is the solution for your problem.

# -*- coding: utf-8 -*-
import re
st="Ladegårdsvej 8B7100 Vejle"
reg=r'([0-9]{4})'
rep=re.split(reg,st)
print rep

Solution for other test cases as provided by RasmusP_963 sir.

# -*- coding: utf-8 -*-
import re
st="Birkevej 8371900 Roskilde"
print re.split(r"([0-9]{4}) ",st)

1 Comment

That won't work, because there could be a street address with long house number without letters (e.g. Birkevej 8371900 Roskilde), so I need to include the whitespace afterwards to ensure it matches on the last four numbers (the zip).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.