How to match regex in python?

Question

describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do

I try to filter the sg-ezsrzerzer out of it (so I want to filter on start sg- till double quote). I'm using python

I currently have:

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-.*\b', a)
print(test)

output is

['sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do']

How do I only get ['sg-ezsrzerzer']?

Yse r'\bsg-[^"]+'

anubhava
– anubhava

2021-06-04 14:44:20 +00:00
Commented Jun 4, 2021 at 14:44 — anubhava
– anubhava, Commented Jun 4, 2021 at 14:44
Or like (?<=")sg-\w+(?=")

The fourth bird
– The fourth bird

2021-06-04 14:44:44 +00:00
Commented Jun 4, 2021 at 14:44 — The fourth bird
– The fourth bird, Commented Jun 4, 2021 at 14:44

JPI93 · Accepted Answer · 2021-06-04 16:11:49Z

The pattern (?<=group_id=\>").+?(?=\") would work nicely if the goal is to extract the group_id value within a given string formatted as in your example.

(?<=group_id=\>") Looks behind for the sub-string group_id=>" before the string to be matched.

.+? Matches one or more of any character lazily.

(?=\") Looks ahead for the character " following the match (effectively making the expression .+ match any character except a closing ").

If you only want to extract sub-strings where the group_id starts with sg- then you can simply add this to the matching part of the pattern as follows (?<=group_id=\>")sg\-.+?(?=\")

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

results = re.findall('(?<=group_id=\>").+?(?=\")', s)

print(results)

Output

['sg-ezsrzerzer']

Of course you could alternatively use re.search instead of re.findall to find the first instance of a sub-string matching the above pattern in a given string - depends on your use case I suppose.

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

result = re.search('(?<=group_id=\>").+?(?=\")', s)

if result:
    result = result.group()

print(result)

Output

'sg-ezsrzerzer'

If you decide to use re.search you will find that it returns None if there is no match found in your input string and an re.Match object if there is - hence the if statement and call to s.group() to extract the matching string if present in the above example.

The fourth bird · Accepted Answer · 2021-06-04 20:23:12Z

The pattern \bsg-.*\b matches too much as the .* will match until the end of the string, and will then backtrack to the first word boundary, which is after the o and the end of string.

If you are using re.findall you can also use a capture group instead of lookarounds and the group value will be in the result.

:group_id=>"(sg-[^"\r\n]+)"

The pattern matches:

:group_id=>" Match literally
(sg-[^"\r\n]+) Capture group 1 match sg- and 1+ times any char except " or a newline
" Match the double quote

See a regex demo or a Python demo

For example

import re

pattern = r':group_id=>"(sg-[^"\r\n]+)"'
s = "describe aws_security_group({:group_id=>\"sg-ezsrzerzer\", :vpc_id=>\"vpc-zfds54zef4s\"}) do"

print(re.findall(pattern, s))

Output

['sg-ezsrzerzer']

Ryszard Czech · Accepted Answer · 2021-06-04 20:49:48Z

Match until the first word boundary with \w+:

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-\w+', a)
print(test[0])

See Python proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  sg-                      'sg-'
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))

Results: g-ezsrzerzer

Collectives™ on Stack Overflow

How to match regex in python?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related