2

I want to split the strings and the numbers. So if the string that is concatenated would be:

Hans went to house number 10 92384 29349

It should split the text into:

Hans went to house number | 10 | 92384 | 29349

I am confused on how to tackle this as split won't work because it will also split Hans | went | to | house | number..

1
  • Do you want to split it into a list or just add | characters to the string? Commented Apr 10, 2017 at 12:41

6 Answers 6

6

Pretty easy with regular expressions:

>>> import re
>>> s = "Hans went to house number 10 92384 29349"
>>> re.split(r'\s+(?=\d+\b)', s)
['Hans went to house number', '10', '92384', '29349']

That said your question is confusing, if you want to add the | char to the output, simply join the output again:

>>> ' | '.join(_)
'Hans went to house number | 10 | 92384 | 29349'

If your goal is to implement a function that does the trick, you can write this:

def split_numbers(string, join=None):
   from re import split
   split = re.split(r'\s+(?=\d+\b)', string)
   return join.join(split) if join else split

Notice that I added the word boundary \b on my regex to avoid matching words starting with a number like the 2cups in the sentence Hans went to house number 10 92384 29349 and drank 2cups of coffee

Sign up to request clarification or add additional context in comments.

2 Comments

regular expressions probably the best way to approach this problem
Let's say it's easy to understand how regular expressions work. And they are powerful. But it isn't that easy to define one ;)
3

If you just want to add | to the string, you can try this:

a="Hans went to house number 10 92384 29349"

print(" ".join("| "+i if i.isdigit() else i for i in a.split()))

Output:

Hans went to house number | 10 | 92384 | 29349

Comments

2

You can split your sentence into words, then try to cast the word into integer. If the cast fail, then just concat

a = "Hans went to house number 10 92384 29349"
res = ""
for word in a.split():
   try:
      number = int(word)
      res += " | %d" % number
   except ValueError:
      res += " %s" % word

Edit: I tried to give the "simplest" solution. I mean, it is longer, but I guess easier to understand . Still, if you understand other solutions (1 line), go for it.

Comments

2

Using regular expression splitting with re:

import re


txt = 'Hans went to house number 10 92384 29349'

' | '.join(re.split('\s(?=\d)',txt))

# 'Hans went to house number | 10 | 92384 | 29349'

Comments

0

Here is how you could do it:

a = 'Hans went to house number 10 92384 29349'

result = [' '.join([item for item in a.split(' ') if not item.isdigit()])] + [int(item) for item in a.split(' ') if item.isdigit()]

And if you want output as you showed:

new_result = ' | '.join([str(item) for item in result])

2 Comments

The order of the words is changed when you have an example like a = 'Hans 12 went to house number 10 92384 29349'.
Yes it is. But that wasn't in example. And that same example would break the top voted solution as well.
0

You can do this:

a = "Hans went to house number 10 92384 29349"

res = []

for item in a.split():
    if item.isdigit():
        res.extend(['|', item])
    else:
        res.append(item)

print(' '.join(res))
#Hans went to house number | 10 | 92384 | 29349

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.