2

I'm trying to split a sentence into three different variables for later use, and I need to specify some rules which will split it in a way I need.

EXAMPLE SENTENCE:

sentence = 'SUPER Jumper Colour BROWN-8'

From this I need three Variables

textBeforeColour = 'SUPER Jumper Colour'
Colour = 'BROWN'
Size = '8'

PS. the Colour (BROWN-8) will always be in CAPS Anything before the colour (BROWN-8) might have some words in CAPS but not all.

I've created a script that will do just that but I know that if the text changes slightly the script will break. For example

import re
text = 'SUPER Jumper Colour BROWN-8'
list = text.split()
myList = []
lastWord = list[-1]

for iterating_var in list:
   if iterating_var is not list[-1]: #THIS GIVES ME THE 'BEFORE COLOUR' TEXT
        myList.append(iterating_var)

if lastWord == 'SIZE':
     print('ONE SIZE') #This is used when the Size is not a number but comes as ONE SIZE
else:
    splitText = re.split('-',lastWord)
    print(splitText[0])
    print(splitText[1])
    Colour = splitText[0]
    size = splitText[1]

Now all of this works. But if the string will use a colour: LIGHT BLUE - this script will keep the 'LIGHT' with the sentence variable not with the colour Variable.

1
  • A small edit to the original question. I've realized that some sentences could end with a letter size e.g (SUPER Jumper Colour BROWN-S) or in some cases (SUPER Jumper Colour BROWN-S/M). - Does anyone know know maybe how to get this rule involved? @Patrick Commented Oct 3, 2019 at 13:52

2 Answers 2

3
import re
text = "blah Blah LIGHT BLUE-8"

if text.split()[-1] == "SIZE":
    print("ONE SIZE")
else:
    colour = re.findall("([A-Z ]+)-[0-9]$", text)[0][1:]
    print(colour)
    size = int(re.findall("[0-9]+$", text)[0])
    print(size)
    sentence = re.findall("(.*[^A-Z ])[A-Z ]+-[0-9]$", text)[0]
    print(sentence)

For the colour: a sequence of zero or more capital letters and spaces, which are followed by a hyphen, zero or more digits, and the end of the string

For the size: zero or more digits at the end of the string

For the sentence: zero or more characters, then a character that is not a capital letter or space, then the pattern for the colour

Sign up to request clarification or add additional context in comments.

3 Comments

Its perfect!! Thank you. Simple followup question - is there a manual somewhere for the re.findall/split etc. - class where i can look up parts of these commands?
Glad I could help :) There is a good list here: w3schools.com/python/python_regex.asp
Got it, I will look more into them - THANKS AGAIN!
3

You should be able to do this in a single regex with capturing groups:

import re

pat = re.compile(r'^([\w\s]+?)\s+([A-Z\s]+)-(\d+)$')

sentence = 'SUPER Jumper Colour LIGHT BLUE-88'

match = pat.match(sentence)
if match:
    text, color, number = match.groups()
    print(text)    # SUPER Jumper Colour
    print(color)   # LIGHT BLUE
    print(number)  # 88

Regular expressions are powerful but can get complicated. If you're unfamiliar with them, here is the documentation for the re module

5 Comments

Also Works like a charm - Thank you for help. I will look into documentation as it sure looks complex!
Hi. A small edit to the original question. I've realized that some sentences could end with a letter size e.g (SUPER Jumper Colour BROWN-S) or in some cases (SUPER Jumper Colour BROWN-S/M). - Would you know how to incorporate these changes?
You could try just changing that final capturing group (\d+) to capture all characters: (.+).
Thank you it did it. Last question. I just seen that some sentences have more than one hyphen (-) e.g. 'Super-Cool Jumper BROWN-S'. The script produces no results when this occurs.
You can add hyphens to the first character class to make them acceptable there too: ^([\w\s-]+?)\s+([A-Z\s]+)-(\d+)$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.