I'm trying to split strings based on specific patterns. I have data nested within curly brackets. What I'm trying to do is split the string at the double curly bracket. I've figured out how to do this with "separate" within a data frame, but for future reference I'd love to know why this doesn't work.
I've provided an example below on a single string:
pattern_test<-"[^\\}{2,2}]*\\}{2,2}"
teststring <- "{the {dog} is {hot}},{the {cat} is {lazy}}"
tmp<-unlist(str_extract_all(teststring, pattern_test))
tmp
tmp evaluates to ("hot}}", "lazy}}").
In words, what I'm trying to do in "pattern_test" is to define a class that includes all characters that are not exactly "}}": [^\\}{2,2}] and find as many characters in that class: *, followed by "}}" (outside the square brackets: \\}{2,2}). I suspect I'm making a fundamental error but most of the examples I've found online haven't helped me figure out what the error is. What I want tmp to evaluate to is:
("{the {dog} is {hot}}", ",{the {cat} is {lazy}}"). Why is the substring cutting off at the open bracket?
,?"\\{.*?\\}{2}".{followed by}}and extracts everything in between. The.*?is "non-greedy", so it doesn't take as much as possible. Without the?this returns the whole string, since the last characters are also}}. Not a very good explanation but I hope the idea is clear.