8

I really like Python generators. In particular, I find that they are just the right tool for connecting to Rest endpoints - my client code only has to iterate on the generator that is connected the the endpoint. However, I am finding one area where Python's generators are not as expressive as I would like. Typically, I need to filter the data I get out of the endpoint. In my current code, I pass a predicate function to the generator and it applies the predicate to the data it is handling and only yields data if the predicate is True.

I would like to move toward composition of generators - like data_filter(datasource( )). Here is some demonstration code that shows what I have tried. It is pretty clear why it does not work, what I am trying to figure out is what is the most expressive way of arriving at the solution:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)
5
  • 1
    How do you feel about the built-in filter()? Commented Jan 12, 2018 at 19:15
  • Good suggestion - if I use a predicate function I write filter(data_predicate, mock_datasource()). However, I do prefer the approach where I can write the generate composition like f(g(x)) Commented Jan 12, 2018 at 19:40
  • 1
    @Kevin in that case filter calls for lambda and now you have a clunky expression. filter is good when the filtering function already exists (like str.isdigit, None to test truth values, or such, Commented Jan 12, 2018 at 19:51
  • 1
    @Jean-FrançoisFabre, agreed, filter is a "sometimes" solution. Which is why I didn't go to the effor to build a full-fledged answer around it :-P Commented Jan 12, 2018 at 20:11
  • filter was very useful on strings in python 2 because it saved the need for str.join. Now the joy is gone :) Commented Jan 12, 2018 at 20:12

5 Answers 5

4

data_filter should apply len on the elements of d not on d itself, like this:

def data_filter (d):
    for x in d:
        if len(x) < 8:
            yield x

now your code:

for w in data_filter(mock_datasource()):
    print(w)

returns

liberty
seminar
formula
comedy
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, this I seems get me the closest to what I what I asked for. That being said, I wonder if composing generators entails a performance cost that I did not consider.
That's true, the more you're chaining function/generator calls, the slowest your application will be. Calling a function in python is more expensive than in compiled languages, partly because compiled languages have the ability to inline some calls.
So far in testing comparing execution time of filtering with predicates, vs filtering with composed generators (i.e. based your answer), I am not seeing a huge performance penalty with composition approach. As often the case, need to run more tests ."The first principle is that you must not fool yourself and you are the easiest person to fool." Richard Feynman
that's true. You much bench your various approaches with a relevant datasize (size & contents).
1

More concisely, you can do this with a generator expression directly:

def length_filter(d, minlen=0, maxlen=8):
    return (x for x in d if minlen <= len(x) < maxlen)

Apply the filter to your generator just like a regular function:

for element in length_filter(endpoint_data()):
    ...

If your predicate is really simple, the built-in function filter may also meet your needs.

Comments

0

You could pass a filter function that you apply for each item:

def mock_datasource(filter_function):
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
             "formula","short-circuit", "generate", "comedy"]

    for d in mock_data:
        yield filter_function(d)

def filter_function(d):
    # filter
    return filtered_data

1 Comment

Right - the approach you suggest is similar to the code I am using that is working. I am trying to put the filter at the output end of the datasource. I would like to lift the filter completely out of the generator's code. The closest I have come to that is the use of a predicate in the final example I gave. In any case thanks for the advice!
0

What I would do is define filter(data_filter) to receive a generator as input and return a generator with values filtered by data_filter predicate (regular predicate, not aware of generator interface).

The code is:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    def generator(coll):
        for x in coll:
            if pred(x):
                yield x
    return generator

def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

def data_filter (d):
    if len(d) < 8:
        return True


gen1 = mock_datasource()
filtering = filter(data_filter)
gen2 = filtering(gen1) # or filter(data_filter)(mock_datasource())

print(list(gen2)) 

If you want to further improve, may use compose which was the whole intent I think:

from functools import reduce

def compose(*fns):
    """Compose functions left to right - allows generators to compose with same
    order as Clojure style transducers in first argument to transduce."""
    return reduce(lambda f,g: lambda *x, **kw: g(f(*x, **kw)), fns)

gen_factory = compose(mock_datasource, 
                      filter(data_filter))
gen = gen_factory()

print(list(gen))

PS: I used some code found here, where the Clojure guys expressed composition of generators inspired by the way they do composition generically with transducers. PS2: filter may be written in a more pythonic way:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    return lambda coll: (x for x in coll if pred(x))

Comments

0

Here is a function I have been using to compose generators together.

def compose(*funcs):
    """ Compose generators together to make a pipeline.
    e.g.
        pipe = compose(func1, func2, func3)
        result = pipe(range(0, 5))
    """
    return lambda x: reduce(lambda f, g: g(f), list(funcs), x)

Where funcs is a list of generator functions. So your example would look like

pipe = compose(mock_datasource, data_filter)
print(list(pipe))

This is not original

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.