2

I have a list of strings such as

2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic 
2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight 
2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver 

I want to truncate the string at either the engine capacity (5.4l, 2.5l) or body type (4d wagon, 4d sedan), whichever comes first. So output should be:

2007 ford falcon xr8 ripcurl bf mkii utility
2004 nissan x-trail ti 4x4 t30 
2002 subaru liberty rx my03

I figure I will create a list of words with .split(' '). However, my problem is how to stop at a x.xl or xd word where x could be any number. What sort of regex would pick this up?

2 Answers 2

2

One option would be to replace everything starting from the word that has a number followed by l or a number followed by d followed by wagon or sedan, with an empty string using re.sub():

>>> import re
>>>
>>> l = ["2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic ", "2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight ", "2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver"]
>>> for item in l:
...     print(re.sub(r"(\b[0-9.]+l\b|\d+d (?:wagon|sedan)).*$", "", item))
... 
2007 ford falcon xr8 ripcurl bf mkii utility 
2004 nissan x-trail ti 4x4 t30 
2002 subaru liberty rx my03 

where:

  • \b[0-9.]+l\b would match a word that has one more digits or dots ending with l
  • \d+d (?:wagon|sedan) would match one or more digits followed by a letter d followed by a space and a wagon or sedan; (?:...) means a non-capturing group
Sign up to request clarification or add additional context in comments.

2 Comments

As a follow-on question, how would I limit \d+d to only match if it was a single digit followed by a letter d? I tried \d{0,1}+d but that gives an error
@Testy8 sure, just leave the quantifier alone: \d{1}d. Thanks.
1
^.*?(?=\s*\d+d\s+(?:wagon|sedan)|\s*\d+(?:\.\d+)?l)

You can use this.See demo.

https://regex101.com/r/aC0uK6/1

import re
p = re.compile(ur'^.*?(?=\s*\d+d\s+(?:wagon|sedan)|\s*\d+(?:\.\d+)?l)', re.MULTILINE)
test_str = u"2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic \n2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight \n2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver "

re.findall(p, test_str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.