Extract fields and values from string in Python

Question

I'm trying to extract the field name and the value.From a string containing fields and values like the following one:

/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV

Each string can contain a different number of fields
The field names will always be enclosed between '/' and '='
The values can contain '/' and whitespace but not '='

The expected result is something like:

['location','(7966, 8580, 1)','station','NY','comment','Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

So far I've been able to extract the field names using:

>> re.findall(r"\/([a-z]*?)\=",string)
['location', 'station', 'comment']

And I've tried to use negative ?! without success.

Thanks in advance!

can you change the "/" separator at "(SB / ATCC)" to something like "|" ? So we can split("/") and work with the data — dot.Py
– dot.Py, Commented Jul 19, 2016 at 14:51
if you do the change @dot.Py mentioned something like \=(.*?)\/ might do the trick — depperm
– depperm, Commented Jul 19, 2016 at 14:52
I can't. At first I tried with simple split("/") until I discovered some field values like the one (XX / YY / ZZ) that's when I thought regex could do the trick. — ppflrs
– ppflrs, Commented Jul 19, 2016 at 14:54

alecxe · Accepted Answer · 2016-07-19 14:55:28Z

3

You can use re.split() to first split the "key=value" pairs, then regular str.split() splitting by the first occurrence of =:

>>> dict(item.split("=", 1) for item in re.split(r"\s*/(?=[a-z]*?\=)", s)[1:])
{
  'comment': 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV', 
  'station': 'NY', 
  'location': '(7966, 8580, 1)' 
}

answered Jul 19, 2016 at 14:55

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lee HoYo · Accepted Answer · 2016-07-19 15:28:54Z

1

Just use the re.split()

>>> string
'/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV'
>>> import re
>>> pattern = re.compile(r'\s*/([a-z]+)=')
>>> pattern.split(string)[1:]
['location', '(7966, 8580, 1)', 'station', 'NY', 'comment', 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

edited Jul 19, 2016 at 15:28

answered Jul 19, 2016 at 15:21

Lee HoYo

1,2679 silver badges9 bronze badges

Collectives™ on Stack Overflow

Extract fields and values from string in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related