0

I have several strings with phrases or words separated by multiple spaces.

c1 = "St. Louis       12             Cardinals"
c2 = "Boston          16             Red Sox"
c3 = "New York        13             Yankees"

How do I write a function perhaps using the python split(" ") function to separate each line into an array of strings? For instance, c1 would go to ['St. Louis', '12', 'Cardinals'].

Calling split(" ") and then trimming the component entities won't work because some entities such as St. Louis or Red Sox have spaces in them.

However, I do know that all entities are at least 2 spaces apart and that no entity has 2 spaces within it. By the way, I actually have around 100 cities to deal with, not 3. Thanks!

3
  • Thank you, what is a regex split? Commented Feb 23, 2012 at 8:02
  • 1
    Are the values actually lined up like this? Are those really spaces in between, or tabs? Commented Feb 23, 2012 at 8:40
  • Sorry, I should have clarified. They're all spaces - no tabs. Commented Feb 23, 2012 at 18:22

5 Answers 5

4

Without regular expressions:

c1 = "St. Louis       12             Cardinals"
words = [w.strip() for w in c1.split('  ') if w]
# words == ['St. Louis', '12', 'Cardinals']
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! This does the job without loading a module.
3
import re
re.split(r' {2,}', c1)
re.split(r' {2,}', c2)
re.split(r' {2,}', c3)

2 Comments

Wow, thank you, how does this work? Why is there a comma after the 2?
This is an example of regular expressions, otherwise known as regex (I suggest you take a look!). The expression says what we are splitting around: ` {2,}` means "two or more spaces". If we wrote ` {2,5}`, it would mean 2 to 5 spaces- the comma leaves it open ended.
2

You can use re.split

>>> re.split('\s{2,}','St. Louis       12             Cardinals')
['St. Louis', '12', 'Cardinals']

Comments

2

You could do this with regular expressions:

import re

blahRegex = re.compile(r'(.*?)\s+(\d+)\s+(.*?)')

for line in open('filename','ro').readlines():
    m = blahRegex.match(line)
    if m is not None:
         city = m.group(1)
         rank = m.group(2)
         team = m.group(3)

There's a lot of ways to skin that cat, you could use named groups, or make your regular expression tighter.. But, this should do it.

Comments

2

It looks like that content is fixed-width. If that is always the case and assuming those are spaces and not tabs, then you can always reverse it using slices:

split_fields = lambda s: [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]

or:

def split_fields(s):
    return [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]

Example usage:

>>> split_fields(c1)
['St. Louis', '12', 'Cardinals']
>>> split_fields(c2)
['Boston', '16', 'Red Sox']
>>> split_fields(c3)
['New York', '13', 'Yankees']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.