Python Regex - Parse string and extract key=value pairs

Question

I have some text that I would like to extract Key=Value pairs from (see below). I've attempted to use a regex however the formatting of key=value pairs is not consistent. For example, many values are enclosed in quotes, some are not.

This is the regex which nearly worked, but there are a couple of outliers.

(\w*)=([\w,\",:,\-,(,\.,\+,\)]*)

Message meets Alert condition date=2020-08-20 time=00:33:57 devname=FGT3HD3999906624 devid=FGT3HD3999906624 logid="0100032003" type="event" subtype="system" level="information" vd="root" eventtime=1597847637407862934 tz="+1000" logdesc="Admin logout successful" sn="159999794" user="admin" ui="https(10.198.199.105)" method="https" srcip=10.198.199.105 dstip=192.168.23.254 action="logout" status="success" duration=4843 reason="timeout" msg="Administrator admin timed out on https(10.198.199.105)" Administrator IT Administrator Ph:

It doesn't look like you need regex for this. What makes you think you do? — MisterMiyagi
– MisterMiyagi, Commented Aug 23, 2020 at 7:56
Does this answer your question? Splitting a semicolon-separated string to a dictionary, in Python — MisterMiyagi
– MisterMiyagi, Commented Aug 23, 2020 at 8:00
My post did not show it, but the text is buried in the body of an email message which includes the "Message meets Alert condition" and "Administrator IT..." Also the fields are dynamic, hence the need for a Regex. — John Greenfield
– John Greenfield, Commented Aug 23, 2020 at 9:03
Does this answer your question? Splitting a semicolon-separated string to a dictionary, in Python — Ryszard Czech
– Ryszard Czech, Commented Aug 23, 2020 at 18:39

jdaz · Accepted Answer · 2020-08-24 23:06:58Z

You have a few ways to do this. First, since you said your key-value pairs are embedded in a larger email, you need to extract them. You can do that with this regex, which checks for a line starting with a word and an equals sign:

import re

text = " ... Full email text ... "
dataPoints = re.search(r"^\w*=.*$", text, re.MULTILINE).group(0)

Then you need to create your dictionary.

Option 1: Simplest

Use the following regex find:

result = dict(re.findall(r'(\w*)=(\".*?\"|\S*)', dataPoints))

Regex demo

Option 2: Typical split

Follow the typical method for this problem: split the various key-value combinations into a list, and then split each combination into separate keys and values. However, since your key-value pairs are separated by spaces rather than semicolons, ampersands, or something similar, and some of your values have spaces in them, we can't simply split by spaces. That means we need to use a regex lookahead for this to work properly:

regexSplit = dict([i.split("=") for i in re.split(r"\s(?=\w+=)", dataPoints)])

Option 3: No regex

If you want to avoid using regex altogether for whatever reason, you can use the following, which splits on equals signs and then recombines the keys and values into the proper arrangement for creating a dictionary:

allSplits = dataPoints.split("=")
splitList = [allSplits[0]] + [i for a in allSplits[1:-1] 
    for i in a.rsplit(" ", 1)] + [allSplits[-1]]

splitDict = dict(zip(splitList[::2], splitList[1::2]))

The code above assumes your dictionary will end up with at least 2 items.

Demo for all 3 options

Thanks, this works perfectly and also caters for the dictionary creation.

Stefan · Accepted Answer · 2020-08-24 05:13:43Z

0

What about adding an OR (|) to your regex, e.g.

(\w*)=(\"[\w\s\+()\.]*\"|[\w\-\:\.]*)

matches the string you gave.
Note

\"[\w\s\+()\.]*\" matches all the values enclosed in ""
[\w\-\:\.]* matches the ones without

edited Aug 24, 2020 at 5:13

answered Aug 23, 2020 at 7:46

Stefan

1,96222 silver badges38 bronze badges

5 Comments

John Greenfield Over a year ago

Thanks, the addition of pipe symbol catered for the outliers :)

jdaz Over a year ago

(\w*)=(\".*?\"|\S*) is much simpler: regex101.com/r/m4o3LO/1

Toto Over a year ago

\d is already included in \w, it doesn't make sense to put both in a character class.

Stefan Over a year ago

@Toto You are right, of course \w maches all alphanumeric characters. I updated the answer.

Stefan Over a year ago

@jdaz Yes, it looks also way cleaner.

Collectives™ on Stack Overflow

Python Regex - Parse string and extract key=value pairs

2 Answers 2

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related