2

I have a generic smoothing function, and based on a configuration file (yaml which will be loaded as a dictionary), different implementation (boxcar or gaussian) will be called. These implementations have different number of arguments, for example boxcar requires winsize, while gaussian require winsize and variance.

Here is my current implementation:

def smoothing(dataDf, selected_columns, kwargs):

    method = kwargs['method']

    if method == 'boxcar':
        boxcar(dataDf, selected_columns, kwargs['arguments'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['arguments'])
    else:
        raise NotImplementedError

Would there be a better way to implement this?

8

4 Answers 4

3

I would consider two options

  1. Use a dictionary of functions:

    methods = {
        'boxcar': function1,
        'gaussian': function2
    }
    
    try:
        method = methods[kwargs['method']]
        ...
    except KeyError:
        raise NotImplementedError
    

    You can make it a bit more user-friendly

    def smoothing(dataDf, selected_columns, method, *args, **kwargs):
        try:
            return methods[kwargs['method']](
                dataDf, selected_columns, *args, **kwargs
            )
        except KeyError:
            raise NotImplementedError('{} is not a valid method'.format(method))
    
  2. Use multiple dispatch. It lets you dispatch on function signature and types

    In [1]: from multipledispatch import dispatch
    
    In [2]: @dispatch(int, int)
       ...: def f(x, y):
       ...:     return x, y
       ...: 
    
    In [3]: @dispatch(int)
       ...: def f(x):
       ...:     return x
       ...: 
    
    In [5]: f(1)
    Out[5]: 1
    
    In [6]: f(1, 2)
    Out[6]: (1, 2)
    

    In your case

    @dispatch(...list some/all argument types here...)
    def smoothing(...signature for boxcar...):
        pass
    
    @dispatch(...list some/all argument types here...)
    def smoothing(...signature for gaussian...)
        pass
    
Sign up to request clarification or add additional context in comments.

Comments

1

I'd say that this question is primary opinion based, but I'm kinda... bored so here's my take:

In these cases I always tend to give priority to readability by other users. We all know that code should be properly commented, explained, with lots of documentation around, right? But I also should go to the gym and yet, here I am, writing this from the coziness of my sofa (which I'm not planning to abandon until... mid-April, more or less, when the weather gets better).

To me, if other people are going to read your code, I think it's very important to take advantage of the fact that Python, if properly written can be very, very clear (it's almost like pseudo-code that runs, right?)

So in your case, I wouldn't even create this kind-of-wrapper function. I would have a module smoothing.py containing all your smoothing functions. Not only that, I'd import the module (import smoothing) as opposed to from smoothing import boxcar, gaussian) so I can be very explicit on my calls:

if method == 'boxcar':
   smoothing.boxcar(whatever whatever...)  
   # Someone reading this will be able to figure out that is an smoothing
   # function from a module called `"smoothing"`. Also, no magic done with
   # things like my_func = getattr(smoothing, method)... none of that: be
   # clear and explicit. 
   # For instance, many IDEs allow you to navigate to the function's
   # definition, but for that to work properly, the code needs to be explicit 
elif method == 'gaussian':
   smoothing.gaussian(whatever whatever...)
else:
   raise ValueError(
          'Unknown smoothing method "%s".'
          ' Please see available method in "%s" or add'
          ' a new smoothing entry into %s' % (
                 method, 
                 os.path.abspath(smoothing.__file__), 
                 os.path.abspath(__file__)
          )
   )

Something like that. Something that if someone receives an error, can quickly understand where and why is it happening.

Otherwise, if you still want to keep your structure, I'd say that, since you're always going to need your 'method', don't put that into your kwargs. Make it positional:

def smoothing(method, dataDf, selected_columns, kwargs):
    if method == 'boxcar':
        boxcar(dataDf, selected_columns, kwargs['arguments'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['arguments'])
    else:
        raise NotImplementedError

Another thing you could do, is instead of having the possibility of having bad arguments in the kwargs dict, force it to include the proper arguments (what if someone passes in kwarg['arguments'] the argument method=boxcar but gives you kwarg['arguments'] for gaussian? Don't let that even be possible (make it crash as soon as possible):

def smoothing(method, dataDf, selected_columns, **kwargs):
    if method == 'boxcar':
        assert 'variance' not in kwargs  # If `boxcar` shouldn't have a "variance"
        boxcar(dataDf, selected_columns, kwargs['windsize'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['windsize'], kwargs['variance'])
    else:
        raise NotImplementedError

And always provide a proper message in your exceptions (give a proper explanation on your NotImplementedError)

There's a lot of "magic" you can do with Python. That doesn't mean you have to do it. For instance, you could get a fairly similar behavior to what your implementation of the smoothing function is doing by writing something like this:

def smoothing(dataDf, selected_columns, kwargs):
    return globals().get(
        kwargs.pop('method') or 'NOT_IMPLEMENTED_FOR_SURE!!', 
        lambda *_: (_ for _ in ()).throw(NotImplementedError())
    )(dataDf, selected_columns, kwargs['arguments'])

But if someone else reads that... well... good luck to that person :-P

Comments

1
methods = {
    'boxcar': boxcar,
    'gaussian': gaussian,
}

MESSAGES = {
    'MISSING_METHOD': 'No method named {} was found.'
}

def smooth(dataDf, selected_columns, **kwargs):
    """Smooth dataframe columns."""
    # Here, we are providing a default if the
    # user doesn't provide a keyword argument
    # for `method`. You're accessing it with
    # brackets and if it's not provided it will
    # raise a KeyError. If it's mandatory, put
    # it as such. But you can do better by
    # providing a default. Like below

    method_name = kwargs.get('method', 'gaussian')
    method = methods.get(method_name)

    if method is None:
        msg = MESSAGES['MISSING_METHOD'].format(method_name)
        raise NotImplementedError(msg)
    return method(dataDf, selected_columns, **kwargs)

If method is a fairly important part of the call and the user is aware of the argument, you can write it as follows:

def smooth(dataDf, selected_columns, method='gaussian', **kwargs):
    """Smooth dataframe columns."""
    # Note that kwargs no longer contains `method`;
    # our call to the 'real' method is not 'polluted'.

    _method = methods.get(method)

    if _method is None:
        msg = MESSAGES['MISSING_METHOD'].format(method)
        raise NotImplementedError(msg)
    return _method(dataDf, selected_columns, **kwargs)

However, there's a problem in the methods dictionary:

  • Its keys are function names and values are function objects, which requires us to have access to the functions.

You want to get a dictionary where keys and values are strings (such as one you would get from a YAML file).

The assumption I will make is that the functions exist in the current context, whether you have defined them or imported them.

from third.party.library import really_ugly_name_you_didnt_choose_but_imported

def gaussian(dataDf, selected_columns, **kwargs):
    pass

_methods = globals()

# This is a dictionary where keys and values
# only contain strings, like from a YAML file.

methods = {
    'boxcar': 'boxcar',
    'gaussian': 'gaussian',
    'roundcar': 'really_ugly_name_you_didnt_choose_but_imported',
}

MESSAGES = {
    'MISSING_METHOD': 'No method named {} was found.'
}

def smooth(dataDf, selected_columns, method='gaussian', **kwargs):
    """Smooth dataframe columns."""

    # Lets check if it is in authorized methods
    # We're using globals() and we don't want
    # the user to accidentally use a function
    # that has no relation to smoothing.

    # So we're looking at the dictionary from
    # the YAML file.

    # Let's get the "real name" of the function
    # so if `method` were (str) 'roundcar', `method_name`
    # would be (str) 'really_ugly_name_you_didnt_choose_but_imported'

    method_name = methods.get(method)

    # Now that we have the real name, let's look for
    # the function object, in _methods.

    _method = _methods.get(method_name)

    if None in (_method, method_name):
        msg = MESSAGES['MISSING_METHOD'].format(method)
        # Note that we raise the exception for the function
        # name the user required, i.e: roundcar, not
        # the real function name the user might be unaware
        # of, 'really_ugly_name_you_didnt_choose_but_imported'.
        raise NotImplementedError(msg)
    return _method(dataDf, selected_columns, **kwargs)

Comments

0

Your algorithm functions are Strategies. You can store them in a dictionary for easy lookup.

Use a defaultdict with a "not implemented" Strategy for missing algorithms:

def algorithm_not_implemented(*args, **kwargs):
    raise NotImplementedError

algorithms = defaultdict(algorithm_not_implemented)

This means if you try and access a non-existent algorithm, it will return algorithm_not_implemented and when you call it, it will raise NotImplementedError:

>>> algorithms['pete'](1, 2, 3)
Traceback (most recent call last):
NotImplementedError

You can add your algorithms:

algorithms['boxcar'] = boxcar
algorithms['gaussian'] = gaussian

And you can call them:

def smoothing(dataDf, selected_columns, kwargs):
    method = kwargs['method']
    arguments = kwargs['arguments']

    algorithms[method](dataDf, selected_columns, arguments)

2 Comments

You were preaching about strategy pattern, yet your answer has little to do with it, though it solves the same problem. And how is it different from several dict-based answers, that have already been posted?
@EliKorvigo not sure I was preaching, sorry if it came across that way. I think my solution is unique in that it uses a defaultdict

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.