I want to split a string with number, letters and specific white spaces into specific components.
consider the string
ATLANTYS2_I - 3103 aRNH_profile - 121 2.7e-35 118.7 0.0 1 1 2.7e-37 5.6e-35 117.7 0.0 2 120 1342 1458 1341 1459 0.98 Gypsy Arabidopsis thaliana_+1
now let the string be content[3]. I ran the command the
import re
result = re.split(r'\s{2,}', content[3])
which gave me
['ATLANTYS2_I',
'-',
'3103 aRNH_profile',
'-',
'121',
'2.7e-35',
'118.7',
'0.0',
'1',
'1',
'2.7e-37',
'5.6e-35',
'117.7',
'0.0',
'2',
'120',
'1342',
'1458',
'1341',
'1459 0.98 Gypsy\tArabidopsis thaliana_+1']
I have split the string by 2 spaces. but the last entry 1459 0.98 Gypsy\tArabidopsis thaliana_+1 is still grouped as one.
I thought of splitting the last entry by one space, deleting the last entry in the result and adding the split by one space. However this seems to me rather clunky.
Is there a way to split this elegantly so that I would get the following result for the last entry
'1459','0.98', Gypsy\tArabidopsis thaliana_+1'?