1

I wanted to adapt the python regex (PCRE) technique in this SO question Find string between two substrings to Haskell so that I can do the same in Haskell.

But I can't figure out how to make it work in GHC (8.2.1). I've installed cabal install regex-pcre, and came up with the following test code after some search:

import Text.Regex.PCRE
s = "+++asdf=5;iwantthis123jasd---"
result = (s ++ s) =~ "asdf=5;(.*)123jasd" :: [[String]]

I was hoping to get the first and last instance of the middle string

iwantthis

But I can't get the result right:

[["asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd","iwantthis123jasd---+++asdf=5;iwantthis"]]

I haven't used regex or pcre in Haskell before.

Can someone help with the right usage (to extract the first and last occurrence) ? Also, I don't quite understand the ::[[String]] usage here. What does it do and why is it necessary?

I searched the documentation but found no mention of the usage with type conversion to :: [[String]].

0

1 Answer 1

4

The result you obtain is the following:

Prelude Text.Regex.PCRE> (s ++ s) =~ "asdf=5;(.*)123jasd" :: [[String]]
[["asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd","iwantthis123jasd---+++asdf=5;iwantthis"]]

This is correct, the first element is the implicit capture group 0 (the entire regex), and the second element is that of capture group 1 (the one that matches (.*). Since it matches like:

+++asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd---

So it still matches between the asdf=5; and 123jasd part.

This is due to the fact that the Kleene start * matches greedy: it aims to capture as much as possible. You can use (.*?) however to use a non-greedy quantifier:

Prelude Text.Regex.PCRE> (s ++ s) =~ "asdf=5;(.*?)123jasd" :: [[String]]
[["asdf=5;iwantthis123jasd","iwantthis"],["asdf=5;iwantthis123jasd","iwantthis"]]

And now we obtain two matches. Each match has "iwantthis" as capture group 1.

You can use map (head . tail) or map (!!1) on it to obtain a list of captures of the (.*?) part:

Prelude Text.Regex.PCRE> map (!!1) ((s ++ s) =~ "asdf=5;(.*?)123jasd" :: [[String]])
["iwantthis","iwantthis"]
Sign up to request clarification or add additional context in comments.

1 Comment

map (!! 1) might be more readable, because the number indicates the capture group.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.