0

Could you please let me know how does we split a string with multiple delimiters in python and one of the delimited is a text

for example, we have a string 'this is my worksplace.Work restricted.this is a sample text.this is text 2.Work restricted'

I want a split using dot first and then with 'Work restricted'

I did a regular expression using dot and able to get the list, sample below

textforsplit='this is my worksplace.Work restricted.this is a sample text.this is text 2.Work restricted'
testArray=re.split('[.]',textforsplit)

this is working with dot and I am able to get the list as

['this is my worksplace','Work restricted','this is a sample text','this is text2', 'Work restricted']

But I want to filter the results again and need to get a list excluding the 'Work restricted' text, that is the final list should be

['this is my worksplace','this is a sample text, 'this is text 2']

is there anyway I can achieve this using modifications in the regular expression conditions in python

thank you

3 Answers 3

1

There's no need to use a regular expression for this, since the delimiter is a fixed string. Just use the regular str.split() method. Then you can remove Work restricted using a list comprehension.

textforsplit='this is my worksplace.Work restricted.this is a sample text'
testArray=textforsplit.split('.')
testArray = [x for x in testArray if x != 'Work restricted']
Sign up to request clarification or add additional context in comments.

5 Comments

thank you for your response, I updated the question to avoid confusion, first we have to split with dot and then with 'Word restricted'. Could you please help
thanks much, this works, I will test all the conditions and update
Can we add multiple conditions in the if statement, one scenario is the string can be like 'this is a simple text.Work restricted.' there can be a dot at the end and the split will create a empty list element ''. here, ['this is a simple text','Work restricted','']. So, I need to remove the empty elements ''. So, in the if condition I can check for the != 'Work restricted or '', correct?
if x != 'Work restricted' and x != '' and ...
or if x not in {'Work restricted', '', ...}
1

You can filter them with a list comprehension:

testArray = [x for x in testArray if x != 'Work restricted']

1 Comment

thank you for your response, I updated the question to avoid confusion, first we have to split with dot and then with 'Word restricted'. Could you please help
1

Here is a pure regex solution using findall:

>>> textforsplit='this is my worksplace.Work restricted.this is a sample text.this is text 2.Work restricted'
>>> print ( re.findall(r'(?:^|(?<=\.))(?!Work restricted)[^.]+', textforsplit) )
['this is my worksplace', 'this is a sample text', 'this is text 2']

RegEx Demo

RegEx Details:

  • (?:^|(?<=\.)): If we are start or previous character is a dot
  • (?!Work restricted): Negative lookahead to assert that we don't have Work restricted ahead of us
  • [^.]+: Match 1+ of any character that is not DOT

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.