4

I can understand the following sed regular expression.

 sed 's/.*\(SNAP=[^|]*\) |.*/\1/' | sort | uniq -c > $log.snaps

I have the task of converting this bash line to Python code. What is best way to do this? Simply invoke os.system(cmd) with above as cmd ? Or use the Python re module ? Any pseudo code is most welcome. :)

3
  • 1
    Depends what you mean by "best" and what exactly your goal is. Of course you could just throw it into os.system(), but then why are you bothering with Python at all? If you want to reimplement it in native Python, then yes, use the re module and the sorted() builtin; there's no uniq equivalent but it's fairly trivial to implement. Commented Nov 18, 2014 at 3:49
  • 1
    @lak could you provide an example of what the above code would do? Commented Nov 18, 2014 at 3:52
  • It basically excludes content after it encounters | check the hello example here stackoverflow.com/questions/26965276/… @AdamRosenfield, I'm looking for performance. It have 10-15 sed statements like this on a large file. Wondering which is the proper way to do Commented Nov 18, 2014 at 4:03

1 Answer 1

3

You asked for the best way, I'm just giving you a simple one. You could surely optimize it. But still, it is worth testing with your constraints, since invoking a shell takes some time.
It should be worth noting that pipes in shell might be a great way to have faster code, since sed can start to work whithout waiting for cat to finish. sort will also be able to begin its work but obviously will only output when sed is done working. So it is a great way to use your CPU during your IOs and should be considered as a low effort/good performance solution.
I've tried with a simple example, but you will get the idea :

In test :

love
lol
loki
loki
ki
loutre
poutre

Simple bash command, looking like yours :

cat test | sed 's/lo\(.*\)$/\1/' | sort | uniq

Outputs :

ki
l
poutre
utre
ve

Now let's try to do the same in python :

#!/usr/bin/python

import re

s = """love
lol
loki
loki
ki
loutre
poutre"""

arr = s.split('\n')                                             # sed iterates on each line
arr = map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), arr)  # sed
arr = set(arr)                                                  # uniq
arr = sorted(list(arr))                                         # sort

print '\n'.join(arr)                                            # output it

This could also be written in a ugly line of code :

print '\n'.join(sorted(list(set(map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), s.split('\n'))))))
Sign up to request clarification or add additional context in comments.

3 Comments

I don't think I would come up with a solution using map and lambda :) Thanks a lot @Jerska
Well, although it is working, you should have a look to my edit before starting to use such a solution.
Yes, Just read your recent edit, will consider them before using.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.