3

I'm parsing this line-

0386          ; Greek # L&       GREEK CAPITAL LETTER ALPHA WITH TONOS

Basically, I need -

point = 0386
script = Greek

And I'm doing it like this,

point = line.split(";")[0].replace(" ","")
script = line.split("#")[0].split(";")[1].replace(" ","")

I'm not convinced that what I'm doing is the most pythonic way of doing it, is there a more elegant way of doing this? Maybe a regex one-liner?

4 Answers 4

3

If you want a regex one liner:

point, script = re.search("^(\d+)\s*;\s*(\S+)\s*.*$",s).groups()

where s is your string, and of course you need to import re

Sign up to request clarification or add additional context in comments.

2 Comments

("^(.*)\s+;\s+(.*)\s+#.*$", s).groups() worked for me. The above didn't.
@ComputerFellow, Your regex matches the number with the white space after it. But if it works for you I'm glad! Anyway, the point here is to show how you do it with a regex in one line.
3
>>> code, desc = line[:line.rfind('#')].split(';')
>>> code.strip()
'0386'
>>> desc.strip()
'Greek'

Comments

2

Using map with unbound method str.strip:

>>> line = '0386      ; Greek # L&   GREEK CAPITAL LETTER ALPHA WITH TONOS'
>>> point, script = map(str.strip, line.split('#')[0].split(';'))
>>> point
'0386'
>>> script
'Greek'

Using list comprehension:

>>> point, script = [word.strip() for word in line.split('#')[0].split(';')]
>>> point
'0386'
>>> script
'Greek'

9 Comments

This looks very concise, although I prefer not using map.
@GamesBrainiac, I added list comprehension version.
@GamesBrainiac Why not a map? How would it affect the performance?
@ComputerFellow LCs are generally faster than maps.
This kind of speed difference should never matter. list comprehensions are usually preferred to map because they are easier to read.
|
0

This is how I would've done it:

>>> s = "0386          ; Greek # L&       GREEK CAPITAL LETTER ALPHA WITH TONOS"
>>> point = s.split(';')[0].strip()
>>> point
'0386'
>>> script = s.split(';')[1].split('#')[0].strip()
>>> script
'Greek'

Note that you can re-use s.split(';'). So perhaps saving it to a var would be a good idea:

>>> var = s.split(';')
>>> point = var[0].strip()  # Strip gets rid of all the whitespace
>>> point
'0386'
>>> script = var[1].split('#')[0].strip()
>>> script
'Greek'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.