How to split string with numbers, letters and white spaces in python ?

Question

I want to split a string with number, letters and specific white spaces into specific components.

consider the string

ATLANTYS2_I          -           3103 aRNH_profile         -            121   2.7e-35  118.7   0.0   1   1   2.7e-37   5.6e-35  117.7   0.0     2   120  1342  1458  1341  1459 0.98 Gypsy      Arabidopsis thaliana_+1

now let the string be content[3]. I ran the command the

import re 
result = re.split(r'\s{2,}', content[3])

which gave me

['ATLANTYS2_I',
 '-',
 '3103 aRNH_profile',
 '-',
 '121',
 '2.7e-35',
 '118.7',
 '0.0',
 '1',
 '1',
 '2.7e-37',
 '5.6e-35',
 '117.7',
 '0.0',
 '2',
 '120',
 '1342',
 '1458',
 '1341',
 '1459 0.98 Gypsy\tArabidopsis thaliana_+1']

I have split the string by 2 spaces. but the last entry 1459 0.98 Gypsy\tArabidopsis thaliana_+1 is still grouped as one. I thought of splitting the last entry by one space, deleting the last entry in the result and adding the split by one space. However this seems to me rather clunky.

Is there a way to split this elegantly so that I would get the following result for the last entry '1459','0.98', Gypsy\tArabidopsis thaliana_+1'?

I think you need to split the last entry separately even if it means more code. Better write explicit code than an "elegant" one liner that you won't understand in a month. — Mel
– Mel, Commented Jan 15, 2018 at 10:14
I agree how would you split the last entry so that I would get the desired result? — A.Dumas
– A.Dumas, Commented Jan 15, 2018 at 10:20
Thats a string you have defined and you cannot access by list. Refer this link for better understanding : docs.python.org/2/library/string.html — Rachit kapadia
– Rachit kapadia, Commented Jan 15, 2018 at 10:27

Jan · Accepted Answer · 2018-01-15 13:04:00Z

1

You could use an alternation:

\s{2,}|\t+
# either two+ whitespaces
# or at least one tabulator space

In Python:

import re

string = "ATLANTYS2_I          -           3103 aRNH_profile         -            121   2.7e-35  118.7   0.0   1   1   2.7e-37   5.6e-35  117.7   0.0     2   120  1342  1458  1341  1459 0.98 Gypsy    Arabidopsis thaliana_+1"

rx = re.compile(r'\s{2,}|\t+')
print(rx.split(string))

Which yields

['ATLANTYS2_I', '-', '3103 aRNH_profile', '-', '121', '2.7e-35', '118.7', '0.0', '1', '1', '2.7e-37', '5.6e-35', '117.7', '0.0', '2', '120', '1342', '1458', '1341', '1459 0.98 Gypsy', 'Arabidopsis thaliana_+1']

answered Jan 15, 2018 at 13:04

Jan

43.3k11 gold badges57 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Toto Over a year ago

'1459 0.98 Gypsy' must be '1459', '0.98', 'Gypsy'

Mel · Accepted Answer · 2018-01-15 10:29:33Z

0

You can process the last element separately:

last_element = result.pop()  # remove last element from list
numbers, plant = last_element.split('\t')  # split on tab
result += numbers.split()  # split the first part on spaces and add it back
result.append(plant)  # add the second part back

Or you could probably use a regex to split that last element correctly

answered Jan 15, 2018 at 10:29

Mel

6,10510 gold badges40 silver badges42 bronze badges

Collectives™ on Stack Overflow

How to split string with numbers, letters and white spaces in python ?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related