2

I'm trying to extract the field name and the value.From a string containing fields and values like the following one:

/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV
  • Each string can contain a different number of fields

  • The field names will always be enclosed between '/' and '='

  • The values can contain '/' and whitespace but not '='

The expected result is something like:

['location','(7966, 8580, 1)','station','NY','comment','Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

So far I've been able to extract the field names using:

>> re.findall(r"\/([a-z]*?)\=",string)
['location', 'station', 'comment']

And I've tried to use negative ?! without success.

Thanks in advance!

5
  • 2
    can you change the "/" separator at "(SB / ATCC)" to something like "|" ? So we can split("/") and work with the data Commented Jul 19, 2016 at 14:51
  • if you do the change @dot.Py mentioned something like \=(.*?)\/ might do the trick Commented Jul 19, 2016 at 14:52
  • I can't. At first I tried with simple split("/") until I discovered some field values like the one (XX / YY / ZZ) that's when I thought regex could do the trick. Commented Jul 19, 2016 at 14:54
  • try s.replace(" / ", " | ") and s.split("/") Commented Jul 19, 2016 at 14:54
  • why not just .split('//') Commented Jul 19, 2016 at 14:55

2 Answers 2

3

You can use re.split() to first split the "key=value" pairs, then regular str.split() splitting by the first occurrence of =:

>>> dict(item.split("=", 1) for item in re.split(r"\s*/(?=[a-z]*?\=)", s)[1:])
{
  'comment': 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV', 
  'station': 'NY', 
  'location': '(7966, 8580, 1)' 
}
Sign up to request clarification or add additional context in comments.

Comments

1

Just use the re.split()

>>> string
'/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV'
>>> import re
>>> pattern = re.compile(r'\s*/([a-z]+)=')
>>> pattern.split(string)[1:]
['location', '(7966, 8580, 1)', 'station', 'NY', 'comment', 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.