getting specific characters from complex strings in python

Question

I have such strings:

ye<V><vn:inf><N><0><V><cpl:pres><3s>
çok<Postp:adv:ablC><0><N><0><V><cpl:pres><3s>
yağ<N><li><Adv><0><N><0><V><cpl:evid><3s>

And I want to extract;

ye, V, 3s
çok, Postp:adv:ablC, 3s
yağ, N, 3s

I have hundreds millions of such strings. What can be the best, efficient, and fastest way to do it? Can you show an example?

Thanks,

Where is your code?

infotoni91
– infotoni91

2016-12-01 12:10:18 +00:00
Commented Dec 1, 2016 at 12:10 — infotoni91
– infotoni91, Commented Dec 1, 2016 at 12:10

ettanany · Accepted Answer · 2016-12-01 12:17:19Z

5

Try this:

l = s.split('<')
'{}, {}, {}'.format(l[0], l[1][:-1], l[-1][:-1])

Example of output:

>>> s = 'ye<V><vn:inf><N><0><V><cpl:pres><3s>'
>>> l = s.split('<')
>>> '{}, {}, {}'.format(l[0], l[1][:-1], l[-1][:-1])
'ye, V, 3s'

edited Dec 1, 2016 at 12:17

answered Dec 1, 2016 at 12:12

ettanany

20k9 gold badges49 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jadsq Over a year ago

I would suggest using format instead of string concatenation with +

aquil.abdullah · Accepted Answer · 2016-12-01 12:15:41Z

2

You could try using using the findall. For example,

import re
regex = re.compile(r'(?P<g1>3s)|(?P<g2>ye)')
regex.findall(test_string)

This will return a list of tuples for the matches like the following:

# Output
# [('3s', ''), ('', 'ye'), ('3s', ''), ('', 'ye')]

The regular expression that I compiled does not have all of the named groups that you desire, but you can add those easily enough.

answered Dec 1, 2016 at 12:15

aquil.abdullah

3,1673 gold badges24 silver badges40 bronze badges

Comments

Dmitry Erohin · Accepted Answer · 2016-12-01 12:20:08Z

1

s1 = 'ye<V><vn:inf><N><0><V><cpl:pres><3s>'
s2 = 'çok<Postp:adv:ablC><0><N><0><V><cpl:pres><3s>'
s3 = 'yağ<N><li><Adv><0><N><0><V><cpl:evid><3s>'

if __name__ == '__main__':
    for s in (s1,s2,s3):
        print('{0}, {1}, {2}'.format(s.split('<')[0], s.split('<')[1].split('>')[0], s.split('<')[-1].split('>')[0]))

answered Dec 1, 2016 at 12:20

Dmitry Erohin

1371 silver badge5 bronze badges

Collectives™ on Stack Overflow

getting specific characters from complex strings in python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related