2

I have a tab delimited file as such:

this is a sentence. abb what is this foo bar. bev hello foo bar blah black sheep. abb

I could use cut -f1 and cut -f2 in unix terminal to split into two files:

this is a sentence.
what is this foo bar.
hello foo bar blah black sheep.

and:

abb
bev
abb

But is it possible to do the same in python? would it be faster?

I've been doing it as such:

[i.split('\t')[0] for i in open('in.txt', 'r')]

1 Answer 1

2

But is it possible to do the same in python?

yes you can:

l1, l2 = [[],[]]

with open('in.txt', 'r') as f:
    for i in f:
        # will loudly fail if more than two columns on a line
        left, right = i.split('\t')
        l1.append(left)
        l2.append(right)

print("\n".join(l1))
print("\n".join(l2))

would it be faster?

it's not likely, cut is a C program that is optimized towards that kind of processing, python is a general purpose language which has a great flexibility, but is not necessarily fast.

Though, the only advantage you may get by working with an algorithm such as the one I wrote, is that you read the file only once, whereas with cut, you're reading it twice. That could make the difference.

Though we'd need to run some benchmarking to be 100%.

Here's a small benchmark, on my laptop, for what it's worth:

>>> timeit.timeit(stmt=lambda: t("file_of_606251_lines"), number=1)
1.393364901014138

vs

% time cut -d' ' -f1 file_of_606251_lines > /dev/null
cut -d' ' -f1 file_of_606251_lines > /dev/null  0.74s user 0.02s system 98% cpu 0.775 total
% time cut -d' ' -f2 file_of_606251_lines > /dev/null
cut -d' ' -f2 file_of_606251_lines > /dev/null  1.18s user 0.02s system 99% cpu 1.215 total

which is 1.990 seconds.

So the python version is indeed faster, as expected ;-)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.