2

I need to compare 2 tables of similar schema and have 2 generator objects..How do I compare these 2 generators row by row in Python. Need to implement the file comparison logic,

If generator-object-1 =  generator-object-1:
        then read-next-row-generator-object-1,read-next-row-generator-object-1
elif generator-object-1 >  generator-object-2:
        then read-next-row-generator-object-2
elif generator-object-1 <  generator-object-2
        then read-next-row-generator-object-1

Is there any better way to do in Python?

1
  • This feels like a merge instead; finding the next lowest value in two sorted tables. How do you plan to use the generator? Commented Apr 2, 2013 at 15:53

2 Answers 2

3

I used this in the past:

import operator

def mergeiter(*iterables, **kwargs):
    """Given a set of sorted iterables, yield the next value in merged order"""
    iterables = [iter(it) for it in iterables]
    iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}
    if 'key' not in kwargs:
        key = operator.itemgetter(0)
    else:
        key = lambda item, key=kwargs['key']: key(item[0])

    while True:
        value, i, it = min(iterables.values(), key=key)
        yield value
        try:
            iterables[i][0] = next(it)
        except StopIteration:
            del iterables[i]
            if not iterables:
                raise

This would list items from the given iterables in sorted order, provided the input iterables are themselves already sorted.

The above generator would iterate over your two generators in the same order as your psuedo-code would.

Sign up to request clarification or add additional context in comments.

5 Comments

This is neat actually. There are a lot of gymnastics going on here, but it seems pretty solid. Although, sorted(chain(g1,g2,g3,...)) is still easier :)
@mgilson: sorted() consumes the whole iterator. This can be used to merge data already sorted, one item at a time. Great for external sorts (sort a large file by splitting it into chunks, merging of separate database queries, etc).
Yeah, I get that :) (+1) -- And I was about to say that you might want to add a "key", but you seem to have already done it.
@mgilson: Yeah, I wrote this generator for a large-file-external-sort solution, then reused it elsewhere that needed a key. I found the first version first. :-)
@JonClements: Mine has a key argument, heapq.merge() doesn't.
0

There isn't really too much of a better way...

go1 = next(generator1)
go2 = next(generator2)

try:
    while True
        if go1 == go2:
           go1 = next(generator1)
           go2 = next(generator2)
        elif go1 > go2:
           go2 = next(generator2)
        elif go1 < go2:
           go1 = next(generator1)
except StopIteration
    pass #Done now ...

Of course, what you're describing here is really the merge stage of a merge sort (or at least that's how it seems) -- Although you don't yield the rest of the objects after one generator is exhausted. CPython's builtin sort is very merge-like (Tim-sort is a hybrid of insertion sort and merge sort). So, in this case, if you don't mind having a list at the end, you could just do:

import itertools as it
sorted(it.chain(generator1,generator2))

and Bob's your uncle.

1 Comment

I'm basically trying to write a field by field compare tool between 2 iterator objects...The iterator objects contains number of fileds...Is there any Python tool that already exists?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.