1

I have a String Alltext that contains text across multiple lines

aaaaa    
D0  
aaaaa

text0...........


aaaaa                                      
D1  
aaaaa  
text 1 ..........


aaaaa  
D2  
aaaaa  
text 2    

I want to keep just the text part i.e. text0...., text1 , text2.... and remove the indicators

aaaaa
D0
aaaaa, 

aaaaa
D1
aaaaa

and so on.These indicate next text segment. I tried this regular expression

re.sub("[a]* \sD[0-9]*\\s[a] * ", " ",Alltext)

but this just removes D0, D1 and not the aaaa The output I get

aaaaa  
aaaaa   
text0  
aaaaa       
aaaaa  
text1 

How can I remove these aaaaa

2 Answers 2

1

You don't need to put a single character inside character class and also you don't need to double escape \s

a*\s*D[0-9]*\s*a*\s*

DEMO

Python code would be,

>>> import re
>>> s = """aaaaa    
D0  
aaaaa

text0...........


aaaaa                                      
D1  
aaaaa  
text 1 ..........


aaaaa  
D2  
aaaaa  
text 2  """
>>> m = re.sub(r'a*\s*D[0-9]*\s*a*\s*', r'', s)
>>> m
'text0...........\n\n\ntext 1 ..........\n\n\ntext 2  '
>>> print m
text0...........


text 1 ..........


text 2
Sign up to request clarification or add additional context in comments.

Comments

1
 print re.findall(r"^text.*$",x,re.M)

Simle findall should do this as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.