1

the text input is something like this West Team 4, Eastern 3\n

-------Update--------

the input is a txt file containing team name and scores like a football game the whole text file will be something like this, two names and scores:

West Team 4, Eastern 5
Nott Team 2, Eastern 3
West wood 1, Eathan 2
West Team 4, Eas 5

I am using with open to read file line by line therefore there will be \n at the end of the line.

I would like to extract this line of text in to something like:

['West Team', 'Eastern']

What I currently have in mind is to use regex

result = re.sub("[\n^\s$\d]", "", text).split(",")

this code results in this:

['WestTeam','Eastern']

I'm sure that my regex is not correct. I want to remove '\n' and any number including the space in front of the number but not the space in the middle of the name.

Open to any suggestion that to achieve this result, doesn't necessarily use regex.

2
  • 2
    You really need to define the "rules" that describe your input and output data. Your input looks as though it may be comma-delimited where each token (split by comma) ends with a number that you want to remove. If that's the case you really don't need RE Commented Feb 9, 2022 at 10:35
  • Have you checked the solutions here? One of them does not require the regex usage and seems just what you want unless you want to clarify the requirements. Or do you want something like re.findall(r',?\s*(\D*[^\d\s])', text)? Commented Feb 10, 2022 at 10:39

7 Answers 7

1

So many ways this can be done, but looking at your data you could use rstrip() quite nicely:

s = 'West Team 4, Eastern 3\n'
lst = [x.rstrip('\n 0123456789') for x in s.split(', ')]
print(lst)

Or maybe rather use:

from string import digits
s = 'West Team 4, Eastern 3\n'
lst = [x.rstrip(digits+'\n ') for x in s.split(', ')]
print(lst)

Both options print:

['West Team', 'Eastern']
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a non-regex approach to keep any letters/spaces after splitting with a comma:

text = "West Team 4, Eastern 3\n"
print( ["".join(c for c in x if c.isalpha() or c.isspace()).strip() for x in text.split(',')]  )
# => ['West Team', 'Eastern']

Or a regex approach to remove any chars other than ASCII letters and spaces matched with the [^a-zA-Z\s]+ pattern:

import re
rx = re.compile(r'[^a-zA-Z\s]+')
print( [rx.sub("", x).strip() for x in text.split(',')]  )
# => ['West Team', 'Eastern']

Another similar solution can be used to extract one or more non-digit char chunks after an optional comma + whitespaces:

print(re.findall(r',?\s*(\D*[^\d\s])', text))

See the Python demo.

In case there are consecutive non-letter chunks you can use

import re
text = "West Team 4, Eastern 3\n, test 23 99 test"
rx = re.compile(r'[^\W\d_]+')
print( [" ".join(rx.findall(x)) for x in text.split(',')]  )

See the Python demo yielding ['West Team', 'Eastern', 'test test']. The [^\W\d_]+ pattern matches any one or more Unicode letters.

1 Comment

This assumes the input is a CSV. It splits into separate values and treats each of those strings with a cleaning: (a) filter only alpha and space characters (excludes numbers), then (b) trim or strip-off the leading/trailing whitespaces.
0

Actually re.findall might work well here:

inp = "West Team 4, Eastern 3\n"
matches = re.findall(r'(\w+(?: \w+)*) \d+', inp)
print(matches)  # ['West Team', 'Eastern']

The split version, using re.split:

inp = "West Team 4, Eastern 3\n"
matches = [x for x in re.split(r'\s+\d+\s*,?\s*', inp) if x != '']
print(matches)  # ['West Team', 'Eastern']

Comments

0
import re

text = 'West Team 4, Eastern 3\n'

result = re.sub("[\n^$\d]", "", text).split(",")

# REMOVE THE LEADING AND TRAILING SPACES:
result = [x.strip() for x in result]
print(result)
# result: ['West Team', 'Eastern']

Comments

0

You want to:

  • remove '\n' and
  • any number including the space in front of the number
  • but not the space in the middle of the name.

Functions to use:

  • for constant parts you could just replace using str.replace().
  • for all dynamic matches we need a regex to substitute with empty-string using re.sub().
  • for surroundings we can even use str.strip() to remove leading and trailing whitespaces like \n.

Code

import re

input = "West Team 4, Eastern 3\n"

cleaned = re.sub(r'\s+\d', '', input)  # remove numbers with leading spaces
cleaned = cleaned.strip()  # remove surrounding whitespace like \n
print(cleaned)

output = cleaned.split(",") 
print(output)

Prints:

West Team, Eastern
['West Team', 'Eastern']

1 Comment

OP also wants to split on the comma.
0

You can remove the digits and replace possible double spaced gaps with a single space.

Then split on a comma, do not keep empty values and trim the output:

import re

s = "West Team 4 , Eastern 3, test 23 99 test\n,"

res = [
    m.strip() for m in re.sub(r"[^\S\n]{2,}", " ", re.sub(r"\d+", "", s)).split(",") if m
]
print(res)

Output

['West Team', 'Eastern', 'test test']

See a Python demo.

Comments

0

You haven't clearly defined the rules for getting the required output from your sample input. However, this will give what you've asked for but may not cover all eventualities:

in_string = 'West Team 4, Eastern 3\n'

result = [' '.join(t.split()[:-1]) for t in in_string.split(',')]

print(result)

Output:

['West Team', 'Eastern']

5 Comments

@JvdV because that would not produce the desired result
@JvdV No it doesn't. That produces ['West Team 4', 'Eastern 3']. Spot the difference
@JvdV No. It does not assume that. It assumes that there are strings delimited by comma and that each of those strings has an unwanted whitespace delimited token at the end of that string. You could replace '4' with 'four' and that would also be removed. Having said that, the OP hasn't fully defined the requirement which is why I've already said that this may not cover all eventualities.
@JvdV Well spotted. Fixed
Jup you got my vote back. Nice solution!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.