Converting a sed regular expression to python code

Question

I can understand the following sed regular expression.

 sed 's/.*\(SNAP=[^|]*\) |.*/\1/' | sort | uniq -c > $log.snaps

I have the task of converting this bash line to Python code. What is best way to do this? Simply invoke os.system(cmd) with above as cmd ? Or use the Python re module ? Any pseudo code is most welcome. :)

Depends what you mean by "best" and what exactly your goal is. Of course you could just throw it into os.system(), but then why are you bothering with Python at all? If you want to reimplement it in native Python, then yes, use the re module and the sorted() builtin; there's no uniq equivalent but it's fairly trivial to implement. — Adam Rosenfield
– Adam Rosenfield, Commented Nov 18, 2014 at 3:49
@lak could you provide an example of what the above code would do? — Avinash Raj
– Avinash Raj, Commented Nov 18, 2014 at 3:52
It basically excludes content after it encounters | check the hello example here stackoverflow.com/questions/26965276/… @AdamRosenfield, I'm looking for performance. It have 10-15 sed statements like this on a large file. Wondering which is the proper way to do — webminal.org
– webminal.org, Commented Nov 18, 2014 at 4:03

Jerska · Accepted Answer · 2014-11-18 04:16:24Z

3

You asked for the best way, I'm just giving you a simple one. You could surely optimize it. But still, it is worth testing with your constraints, since invoking a shell takes some time.
It should be worth noting that pipes in shell might be a great way to have faster code, since sed can start to work whithout waiting for cat to finish. sort will also be able to begin its work but obviously will only output when sed is done working. So it is a great way to use your CPU during your IOs and should be considered as a low effort/good performance solution.
I've tried with a simple example, but you will get the idea :

In test :

love
lol
loki
loki
ki
loutre
poutre

Simple bash command, looking like yours :

cat test | sed 's/lo\(.*\)$/\1/' | sort | uniq

Outputs :

ki
l
poutre
utre
ve

Now let's try to do the same in python :

#!/usr/bin/python

import re

s = """love
lol
loki
loki
ki
loutre
poutre"""

arr = s.split('\n')                                             # sed iterates on each line
arr = map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), arr)  # sed
arr = set(arr)                                                  # uniq
arr = sorted(list(arr))                                         # sort

print '\n'.join(arr)                                            # output it

This could also be written in a ugly line of code :

print '\n'.join(sorted(list(set(map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), s.split('\n'))))))

edited Nov 18, 2014 at 4:16

answered Nov 18, 2014 at 4:08

Jerska

12.1k4 gold badges38 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

webminal.org Over a year ago

I don't think I would come up with a solution using map and lambda :) Thanks a lot @Jerska

Jerska Over a year ago

Well, although it is working, you should have a look to my edit before starting to use such a solution.

webminal.org Over a year ago

Yes, Just read your recent edit, will consider them before using.

Collectives™ on Stack Overflow

Converting a sed regular expression to python code

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related