The pattern is regular aside from the properties being in any order, so it's certainly doable. I've done this in two steps — one regex to grab the colour at the beginning and extract the properties string, and a second to extract the properties.
import re
inputs = [
'(Names RED (property (x 123) (y 456) (type MT) (label ONE) (code XYZ)))',
'(Names GREEN (property (type MX) (label TWO) (x 789) (y 101)))'
]
# Get the initial part, and chop off the property innerstring
initial_re = re.compile('^\(Names\s([^\s]*)\s\(property\s(.*)\)\)')
# Get all groups from (x 123) (y 456) (type MT) (label ONE) (code XYZ)
prop_re = re.compile('\(([^\s]*)\s([^\s]*)\)')
for s in inputs:
parts = initial_re.match(s)
color = parts.group(1)
props = parts.group(2)
# e.g. (x 123) (y 456) (type MT) (label ONE) (code XYZ)
properties = prop_re.findall(props)
# [('x', '123'), ('y', '456'), ('type', 'MT'), ('label', 'ONE'), ('code', 'XYZ')]
print("%s: %s" % (color, properties))
The output given is
RED: [('x', '123'), ('y', '456'), ('type', 'MT'), ('label', 'ONE'), ('code', 'XYZ')]
GREEN: [('type', 'MX'), ('label', 'TWO'), ('x', '789'), ('y', '101')]
To get this into pandas you can accumulate the properties in a dictionary of lists (I've done this below using defaultdict). You need to store something for empty values so all columns are the same length, here I just store None (or null). Finally use pd.DataFrame.from_dict to get your final DataFrame.
import re
import pandas as pd
from collections import defaultdict
inputs = [
'(Names RED (property (x 123) (y 456) (type MT) (label ONE) (code XYZ)))',
'(Names GREEN (property (type MX) (label TWO) (x 789) (y 101)))'
]
# Get the initial part, and chop off the property innerstring
initial_re = re.compile('^\(Names\s([^\s]*)\s\(property\s(.*)\)\)')
# Get all groups from (x 123) (y 456) (type MT) (label ONE) (code XYZ)
prop_re = re.compile('\(([^\s]*)\s([^\s]*)\)')
columns = ['color', 'x', 'y', 'type', 'label', 'code']
data_dict = defaultdict(list)
for s in inputs:
parts = initial_re.match(s)
color = parts.group(1)
props = parts.group(2)
# e.g. (x 123) (y 456) (type MT) (label ONE) (code XYZ)
properties = dict(prop_re.findall(props))
properties['color'] = color
for k in columns:
v = properties.get(k) # None if missing
data_dict[k].append(v)
pd.DataFrame.from_dict(data_dict)
The final output is
color x y type label code
0 RED 123 456 MT ONE XYZ
1 GREEN 789 101 MX TWO None
dictoflistto build theDataFrame