Parse YAML file with nested parameters as a Python class object

Question

I would like to use a YAML file to store parameters used by computational models developed in Python. An example of such a file is below:

params.yaml

reactor:
  diameter_inner: 2.89 cm
  temperature: 773 kelvin
  gas_mass_flow: 1.89 kg/s

biomass:
  diameter: 2.5 mm                # mean Sauter diameter (1)
  density: 540 kg/m^3             # source unknown
  sphericity: 0.89 unitless       # assumed value
  thermal_conductivity: 1.4 W/mK  # based on value for pine (2)

catalyst:
  density: 1200 kg/m^3                            # from MSDS sheet
  sphericity: 0.65 unitless                       # assumed value
  diameters: [[86.1, 124, 159.03, 201], microns]  # sieve screen diameters
  surface_areas:
    values:
      - 12.9
      - 15
      - 18
      - 24.01
      - 31.8
      - 38.51
      - 42.6
    units: square micron

Parameters for the Python model are organized based on the type of computations they apply to. For example, parameters used by the reactor model are listed in the reactor section. Units are important for the calculations so the YAML file needs to convey that information too.

I'm using the PyYAML package to read the YAML file into a Python dictionary. To allow easier access to the nested parameters, I use an intermediate Python class to parse the dictionary values into class attributes. The class attributers are then used to obtain the values associated with the parameters. Below is an example of how I envision using the approach for a much larger project:

params.py

import yaml


class Reactor:

    def __init__(self, rdict):
        self.diameter_inner = float(rdict['diameter_inner'].split()[0])
        self.temperature = float(rdict['temperature'].split()[0])
        self.gas_mass_flow = float(rdict['gas_mass_flow'].split()[0])


class Biomass:

    def __init__(self, bdict):
        self.diameter = float(bdict['diameter'].split()[0])
        self.density = float(bdict['density'].split()[0])
        self.sphericity = float(bdict['sphericity'].split()[0])


class Catalyst:

    def __init__(self, cdict):
        self.diameters = cdict['diameters'][0]
        self.density = float(cdict['density'].split()[0])
        self.sphericity = float(cdict['sphericity'].split()[0])
        self.surface_areas = cdict['surface_areas']['values']


class Parameters:

    def __init__(self, file):

        with open(file, 'r') as f:
            params = yaml.safe_load(f)

        # reactor parameters
        rdict = params['reactor']
        self.reactor = Reactor(rdict)

        # biomass parameters
        bdict = params['biomass']
        self.biomass = Biomass(bdict)

        # catalyst parameters
        cdict = params['catalyst']
        self.catalyst = Catalyst(cdict)

example.py

from params import Parameters

pm = Parameters('params.yaml')

# reactor
d_inner = pm.reactor.diameter_inner
temp = pm.reactor.temperature
mf_gas = pm.reactor.gas_mass_flow

# biomass
d_bio = pm.biomass.diameter
rho_bio = pm.biomass.density

# catalyst
rho_cat = pm.catalyst.density
sp_cat = pm.catalyst.sphericity
d_cat = pm.catalyst.diameters
sa_cat = pm.catalyst.surface_areas

print('\n--- Reactor Parameters ---')
print(f'd_inner = {d_inner}')
print(f'temp = {temp}')
print(f'mf_gas = {mf_gas}')

print('\n--- Biomass Parameters ---')
print(f'd_bio = {d_bio}')
print(f'rho_bio = {rho_bio}')

print('\n--- Catalyst Parameters ---')
print(f'rho_cat = {rho_cat}')
print(f'sp_cat = {sp_cat}')
print(f'd_cat = {d_cat}')
print(f'sa_cat = {sa_cat}')

This approach works fine but when more parameters are added to the YAML file it requires additional code to be added to the class objects. I could just use the dictionary returned from the YAML package but I find it easier and cleaner to get the parameter values with a class interface.

So I would like to know if there is a better approach that I should use to parse the YAML file? Or should I organize the YAML file with a different structure to more easily parse it?

Maarten Fabré · Accepted Answer · 2018-04-18 14:05:21Z

you could use a nested parser using pint to do the unit parsing

from pint import UnitRegistry, UndefinedUnitError
UNITS = UnitRegistry()
def nested_parser(params: dict):
    for key, value in params.items():
        if isinstance(value, str):
            try:
                value = units.Quantity(value)
            except UndefinedUnitError:
                pass
            yield key, value
        if isinstance(value, dict):
            if value.keys() == {'values', 'units'}:
                yield key, [i * UNITS(value['units']) for i in value['values']]
            else:
                yield key, dict(nested_parser(value))
        if isinstance(value, list):
            values, unit = value

            yield key, [i * UNITS(unit) for i in values]

dict(nested_parser(yaml.safe_load(params)))

{'reactor': {'diameter_inner': <Quantity(2.89, 'centimeter')>,
  'temperature': <Quantity(773, 'kelvin')>,
  'gas_mass_flow': <Quantity(1.89, 'kilogram / second')>},
 'biomass': {'diameter': <Quantity(2.5, 'millimeter')>,
  'density': <Quantity(540.0, 'kilogram / meter ** 3')>,
  'sphericity': <Quantity(0.89, 'dimensionless')>,
  'thermal_conductivity': <Quantity(1.4, 'watt / millikelvin')>},
 'catalyst': {'density': <Quantity(1200.0, 'kilogram / meter ** 3')>,
  'sphericity': <Quantity(0.65, 'dimensionless')>,
  'diameters': [<Quantity(86.1, 'micrometer')>,
   <Quantity(124, 'micrometer')>,
   <Quantity(159.03, 'micrometer')>,
   <Quantity(201, 'micrometer')>],
  'surface_areas': [<Quantity(12.9, 'micrometer ** 2')>,
   <Quantity(15, 'micrometer ** 2')>,
   <Quantity(18, 'micrometer ** 2')>,
   <Quantity(24.01, 'micrometer ** 2')>,
   <Quantity(31.8, 'micrometer ** 2')>,
   <Quantity(38.51, 'micrometer ** 2')>,
   <Quantity(42.6, 'micrometer ** 2')>]}}

You might need to make your units understandable for pint, but for me that just meant changing the microns to µm and square micron to µm², and unitless to dimensionless

using this

statically

configuration = dict(nested_parser(yaml.safe_load(params)))

# reactor
reactor_config = configuration['reactor']
d_inner = reactor_config['diameter_inner']
temp = reactor_config['temperature']
mf_gas = reactor_config['gas_mass_flow']

print('\n--- Reactor Parameters ---')
print(f'd_inner = {d_inner}')
print(f'temp = {temp}')
print(f'mf_gas = {mf_gas}')

dynamically

for part, parameters in nested_parser(yaml.safe_load(params)):
    print(f'--- {part} Parameters ---')
    for parameter, value in parameters.items():
        print(f'{parameter} = {value}')
    print('\n')

you can check out the pint documentation on string formatting to format the units the way you want

My next step is to incorporate Pint so thank you for the example. Can you also comment on how to utilize the your approach in a Python script? In my example I use the class objects in params.py to read the YAML dictionary and assign the values to attributes. Then I refer to those classes in the example.py script. Would this approach work with pint? Or is there a different approach I should use? — wigging
– wigging, Commented Apr 17, 2018 at 16:29
pint works with this approach. The value of the attributes are not instances of pint.Quantity, so the handling of string methods and so will change, but fundamentally these quantities are no different than floats and ints. You can reform your classes to accept the dict of parameters, and use setattr to set the attributes dynamically. Note that using a class to only hold the parameter values is a bit overkill, and a dict will suffice for that purpose — Maarten Fabré
– Maarten Fabré, Commented Apr 18, 2018 at 7:17
I agree that using a class is overkill. Can you provide an example of how to get the values from the dictionary? I’m thinking that a function like get_value(‘density’) would work but how would I define which density? — wigging
– wigging, Commented Apr 18, 2018 at 12:14
One more question. In yaml.safe_load(params) what is params? Is it a string representing the path to the yaml file? — wigging
– wigging, Commented Apr 19, 2018 at 0:40
nested_parser takes any dict in the as in the yaml file, so params can be the file or a yaml string — Maarten Fabré
– Maarten Fabré, Commented Apr 19, 2018 at 19:50

l0b0 · Accepted Answer · 2018-04-17 01:57:49Z

2

If you split the configuration fields into magnitude and unit (as you've already done for surface_areas) you won't have to split and parse them in code.
If you then convert your configuration to JSON you won't need to convert strings to numbers. JSON strings must be quoted, and numbers must be unquoted, so the json module will simply do those conversions for you.

Other than that:

Configuration handling should be separate from building other objects - that way it's easy to use your code whether the configuration comes from a file or from command-line parameters.
Accessing properties two levels deep (such as pm.biomass.diameter) violates the Law of Demeter. You could write for example an as_parameter_list for each class to get a representation like f'rho_cat = {rho_cat}' etc.

edited Apr 17, 2018 at 1:57

answered Apr 17, 2018 at 1:48

l0b0

9,11722 silver badges36 bronze badges

\$\begingroup\$ I'm not interested in using JSON for the parameters file because it does not support comments. I plan to use comments to add more information about certain parameters. I also feel that the YAML format is more readable than JSON. Can you provide an example of the configuration handling you mentioned? \$\endgroup\$

wigging
– wigging

2018-04-17 02:13:06 +00:00
Commented Apr 17, 2018 at 2:13
\$\begingroup\$ Nothing open source off the top of my mind, but I bet any large project that allows configuration either via files or via command line arguments do this. \$\endgroup\$

l0b0
– l0b0

2018-04-17 02:16:29 +00:00
Commented Apr 17, 2018 at 2:16

Add a comment |

Stack Exchange Network

Parse YAML file with nested parameters as a Python class object

2 Answers 2

using this

statically

dynamically

You must log in to answer this question.

Hot Network Questions

Parse YAML file with nested parameters as a Python class object

2 Answers 2

using this

statically

dynamically

You must log in to answer this question.

Related

Hot Network Questions