extracting items using regular expression in python

Question

I have a a file which has the following :

new=['{"TES1":"=TES0"}}', '{"""TES1:IDD""": """=0x3C""", """TES1:VCC""": """=0x00"""}']

I am trying to extract the first item, TES1:=TES0 from the list. I am trying to use a regular expression to do this. This is what i tried but i am not able to grab the second item TES0.

import re
TES=re.compile('(TES[\d].)+')
for item in new:
    result = TES.search(item)
    print result.groups()

The result of the print was ('TES1:',). I have tried various ways to extract it but am always getting the same result. Any suggestion or help is appreciated. Thanks!

Daniel · Accepted Answer · 2014-05-30 07:14:53Z

1

I think you are looking for findall:

import re
TES=re.compile('TES[\d].')
for item in new:
    result = TES.findall(item)
    print result

answered May 30, 2014 at 7:14

Daniel

42.9k4 gold badges57 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

zx81 · Accepted Answer · 2014-05-30 09:32:57Z

First Option (with quotes)

To match "TES1":"=TES0", you can use this regex:

"TES\d+":"=TES\d+"

like this:

match = re.search(r'"TES\d+":"=TES\d+"', subject)
if match:
    result = match.group()

Second Option (without quotes)

If you want to get rid of the quotes, as in TES1:=TES0, you use this regex:

Search: "(TES\d+)":"(=TES\d+)"

Replace: \1:\2

like this:

result = re.sub(r'"(TES\d+)":"(=TES\d+)"', r"\1:\2", subject)

How does it work?

"(TES\d+)":"(=TES\d+)"

Match the character “"” literally "
Match the regex below and capture its match into backreference number 1 (TES\d+)
- Match the character string “TES” literally (case sensitive) TES
- Match a single character that is a “digit” (0–9 in any Unicode script) \d+
  - Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the character string “":"” literally ":"
Match the regex below and capture its match into backreference number 2 (=TES\d+)
- Match the character string “=TES” literally (case sensitive) =TES
- Match a single character that is a “digit” (0–9 in any Unicode script) \d+
  - Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the character “"” literally "

\1:\2
Insert the text that was last matched by capturing group number 1 \1
Insert the character “:” literally :
Insert the text that was last matched by capturing group number 2 \2

Casimir et Hippolyte · Accepted Answer · 2014-05-30 09:30:26Z

0

You can use a single replacement, example:

import re

result = re.sub(r'{"(TES\d)":"(=TES\d)"}}', '$1:$2', yourstr, 1)

answered May 30, 2014 at 9:30

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Collectives™ on Stack Overflow

extracting items using regular expression in python

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related