2

Suppose I have a string s and a list of tuples of strings ts, where the first element in the tuple is the substring of s I'd like to replace with the corresponding second element. In my case, the string s will always be a space-separated list of distinct words; moreover, I wish to replace each word in s in its entirety with the corresponding value in ts (I hope my intent is clear here). A first attempt at this might be:

import qualified Text.Regex as R -- from regex-compat    

replaceAllIn :: String -> [(String, String)] -> String
replaceAllIn = foldl (\acc (k, v) -> R.subRegex (R.makeRegex k) acc v)

This, of course, doesn't work when one key is a substring of another

λ> s = "blah blahblee"
λ> ts = [("blah", "asdf"), ("blahblee", ";lkj")]
λ> replaceAllIn s ts
"asdf asdfblee"

because the first key replaces both occurrences of "blah" upon the first pass, leaving a string that no longer has anything matching "blahblee" for the second pass.

Is there a way to achieve what I want in one pass through the string? Or is there a built-in way (in some library somewhere) to replace multiple patterns at once?

Edit: Immediately after posting I realized I don't know why I'm using regex here. But the question remains valid if I replaced regex substitution with something like replace from MissingH's Data.String.Utils.

5
  • What should happen if you want to replace "foobar" with "marmoset" and "barbaz" with "elephant" when the string contains "foobarbaz"? Commented Feb 1, 2018 at 21:26
  • 2
    Can you just sort ts to replace the longest strings first? Commented Feb 1, 2018 at 21:43
  • @Cirdec My apologies. I was rather unclear in my question. Edited to reflect that that scenario wouldn't happen in my case (there would be a "foobarbaz" key in the list of tuples). Commented Feb 1, 2018 at 21:57
  • @jkeuhlen That should probably do it. Thanks for the insight! Commented Feb 1, 2018 at 21:58
  • Typo: the arguments of a lambda expression must be separated by space not comma, see \acc,. Otherwise, nice Haskell regex code. Commented Aug 21, 2019 at 14:04

2 Answers 2

3

Expanding on my comment from earlier, you can just process the longest string to be replaced first:

λ> s = "blah blahblee"
λ> ts = [("blah", "asdf"), ("blahblee", ";lkj")]
λ> import qualified Data.List as L
λ> replaceAllIn s (L.sortOn fst ts)
"asdf ;lkj"
Sign up to request clarification or add additional context in comments.

Comments

1

In my case, the string s will always be a space-separated list of distinct words; moreover, I wish to replace each word in s in its entirety

The first step I see is to split (or delimit) the string into words by spaces.

Next, I'd sort the replacement pairs from longest to shortest, as @keuhlen suggests.

After that, the problem should be easy, and substring search fast, because it would advance by word boundaries.

(One extra optimization could be grouping the search pattern by length, e.g. in a tree or array, so that for shorter input words, longer patterns are never ever considered.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.