0

I am trying to write a generic code to perform set operation on any number of input files.

normally for any set operation (where I already limit the number of input files), I use something like this.

my_set1 = set(map(str.strip, open('filename1.txt')))
my_set2 = set(map(str.strip, open('filename2.txt')))
common = myset1.intersection(my_set2)

Where each file has only one column.

Now what I am aiming is to put all the set theory functions in it. Something like.

python set.py -i file1,file2,file3,file4 -o inter

These inputs are taken from the user.

Actually user can define the number of input files and the kind of operation he will like.

If anyone of you can show me how it can be done, I can write for the other operations myself like for union and difference

1 Answer 1

2

The set.intersection() and set.update_intersection() methods take any iterable, not just sets.

Since you are only interested in the end-product (the intersection between the files) you'd best use set.intersection_update() here.

Start with one set, then keep updating it with the rest of the files:

with open(files[0]) as infh:
    myset = set(map(str.strip, infh))

for filename in files[1:]:
    with open(filename) as infh:
        myset.intersection_update(map(str.strip, infh))

You can make the method used dynamic based on the command-line switch:

ops = {'inter': set.intersection_update,
       'union': set.update,
       'diff': set.difference_update}

with open(files[0]) as infh:
    myset = set(map(str.strip, infh))

for filename in files[1:]:
    with open(filename) as infh:
        ops[operation](myset, map(str.strip, infh))
Sign up to request clarification or add additional context in comments.

13 Comments

could you please explain why you read the first set alone? shouldn't it work with just the list of sets?
@SamyArous: the intersection between an empty set and anything else is an empty set; you have to start with one of the files if you want intersections to work.
@MartijnPieters So if I understand correctly, I can just put all the sections together in one code file and execute it, right? Don't I need a parser or something?
set.intersection({1, 2, 3, 4}, {4, 5, 6}, {2, 4}) seems to be giving the right result. am I missing something?
@Angelo; I gave you two different variants; the first block uses myset.update_intersection() directly. The second block improves on the first by making the operation dynamic.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.