3

Basically, I want a fancy oneliner that doesn't read all of the files I'm looking at into memory, but still processes them all, and saves a nice sample of them.

The oneliner I would like to do is:

def foo(findex):
    return [bar(line) for line in findex] # but skip every nth term

But I would like to be able to not save every nth line in that. i.e., I still want it to run (for byte position purposes), but I don't want to save the image, because I don't have enough memory for that.

So, if the output of bar(line) is 1,2,3,4,5,6,... I would like it to still run on 1,2,3,4,5,6,... but I would like the return value to be [1,3,5,7,9,...] or something of the sort.

2 Answers 2

5

use enumerate to get the index, and a filter using modulo to take every other line:

return [bar(line) for i,line in enumerate(findex) if i%2]

Generalize that with i%n so everytime that the index is divisible by n then i%n==0 and bar(line) isn't issued into the listcomp.

enumerate works for every iterable (file handle, generator ...), so it's way better than using range(len(findex))

Now the above is incorrect if you want to call bar on all the values (because you need the side effect generated by bar), because the filter prevents execution. So you have to do that in 2 passes, for instance using map to apply your function to all items of findex and pick only the results you're interested in (but it guarantees that all of the lines are processed) using the same modulo filter but after the execution:

l = [x for i,x in enumerate(map(bar,findex)) if i%n]
Sign up to request clarification or add additional context in comments.

6 Comments

would it be if not(i%n) for the 1/n case? is 0 true in python?
yes, but OP wants to skip one line from time to time. Of course when n==2 that's ambiguous :)
Oh, yes, the question does say that. Thanks!
Actually, as I'm trying this, this seems to fail to actually run the inbetween values. So, i have a print(count) in bar(), and it's skipping lines. So, it's only running bar(line) # output 1, and bar(line) #output 5
line for line in findex is doing the same thing as findex, albeit slower :) try it.
|
0

If findex is subscriptable (accepts [] operator with indices), you can try this way :

def foo(findex):
    return [bar(findex[i]) for i in range (0, len(findex), 2) ] 

1 Comment

except that findex must be subscriptable for that to work. Pass it a file handle and it doesn't work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.