1

I'd like to implement a sed-like search-and-replace in Python.

Now obviously, Python has the re module:

import re
re.sub("([A-Z]+)", r"\1-\1", "123 ABC 456")

However, I would like to specify the search/replace operation in a single string, like in sed (leaving aside any escaping issues for now):

s/([A-Z]+)/\1-\1/g

The reason, I prefer this syntax, is because the actual search&replacement specification is supplied by the user, and I think it is simpler for the user to specify a single search/replace string, rather than both a pattern and a template.

Update

I'm only interested in sed's s (search/replace) command, for single lines (no special extensions). The use-case is really to allow users to provide a string-transformation (with groups) for hostnames.

Any ideas?

2
  • what about other sed commands? there are quite a lot. What about g option or not? this is too broad right now (means: a lot of code would have to be written to convert sed expression into search & replace python expression). How far do you want to go . rewrite sed.py? or simpler? Commented Sep 7, 2017 at 15:14
  • @Jean-FrançoisFabre I updated/simplified the question Commented Sep 7, 2017 at 15:24

2 Answers 2

3

My first thoughts were just to split it by / and pass it as args to re.sub.

Turns out this is rather complicated and as I'm pretty sure its not bulletproof, so I give you this as a starting point.

Thing is, what if we want to deal with slashes, as in replace slashes with backslashes. Then the sed expression will be

's/\\/\//g'

I have to split it by slash that is not preceded by backlash

_, pattern, repl, options = re.split(r'(?<!\\)/', sed)

To make it more complicated, the shash can be preceded by two backslashes, so:

_, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)

And re.sub will look like

re.sub(pattern, repl, s, count='g' not in options)

Ups, no, in Python, slash doesn't have to be escaped, so:

re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)

>>> import re
>>> s = r'\some\windows\path'
>>> sed = r's/\\/\//g'
>>> _, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)
>>> re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)
'/some/windows/path'
Sign up to request clarification or add additional context in comments.

Comments

1

Python's re just doesn't support this syntax. If you want to have such a tool, you'll need to develop your own API, so has to parse a sed-like command and to execute the corresponding re function.

You could write a function that, given a sed-like s/ command, parses it, and returns the corresponding re function. This returned function could then be used on whichever string.

def run_sed_sub(command):
    regex = re.compile(r"(?!=\\)/")    # split on unescaped slashes
    parts = regex.split(command)
    if parts[0] != 's':
        raise ValueError("Not a sub command")

    regex = re.compile(parts[1])
    return lambda s: regex.sub(parts[2], s)

>>> func = run_sed_sub(r"s/Hello/Goodbye/g")
>>> print(func("Hello, world!"))
Goodbye, world!

>>> func = run_sed_sub(r"s/([A-Z]+)/\1-\1/g")
>>> print(func("123 ABC 456"))
123 ABC-ABC 456

There are some edgy cases that would probably be painful to handle, such as linebreaks, but the idea is here. You might also want to replace the slashes that were escaped sed-wise with normal slashes, so parts = [re.sub(r"\\/", "/", p) for p in parts] should do the trick.

I'm not sure either how you would exactly handle the flags at the end, but I suppose it's not really difficult if you know what behaviours you're expecting.

I would add nevertheless that the boilerplate of implementing such a tool is probably much greater than just learning Python's re.

4 Comments

splitting on / won't work because sed command may have escaped / in search or replace strings.
@anubhava yes I was mulling that over. But this works for the command provided by OP, so I guess it's a good start...
also: parenthesis to create groups are escaped with sed, not escaped with python re. Not trivial.
@Jean-FrançoisFabre Right as well. Anyway, to be fair, I don't think creating an interface between sed and re is of any use, and it'd probably be much easier to "simply" learn re.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.