I need some help with the following pattern, I am struggling many hours now. I have a text like:
<<12/24/2015 00:00 userrrr>>
********** Text all char and symbols ************
<<12/24/2015 00:00 CET userr>>
Text all char and symbols
<<12/24/2015 00:00 GMT+1 userrrr>> Text in same line
<<12/24/2015 00:00 CET userrr>>
Text all characters and symbols
<<12/24/2015 00:00 GMT+1 userrrrrrr>> Text in same line
More Text all characters and symbols
<<12/24/2015 00:00 CET userrrrr>>
More text all characters and symbols
<<12/24/2015 00:00 CET userrrrrrrrrrr>>
More Text all characters and symbols
By Using the pattern:
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)
The datetime and everything between the arrows is matched correctly.Unfortunately, I can not find a way to extract the text between the patterns.The final groups should look like (left_arrows), (datetime), (user), (right_arrows), (text).The closer I got was by using:
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}))
But it doesn't match the first and the last correctly.Click Here to check the result(pythex.org)
line.startswith("<<")could not do most of what you want?BeautifulSoup?..