1

I have a TMemo that displays text from a query. I would like to remove all chars between '{' and '}' so this string '{color:black}😊{color}{color:black}{color}' would end up like this 😊.

MemoComments.Lines.Text :=  StringReplace(MemoComments.Lines.Text, '{'+ * +'}', '', rfReplaceAll);

I know that the * in my code is wrong. It's just a placeholder. How can I do this the right way?

Is this possible, or do I have to create a complicated loop?

0

2 Answers 2

6

This is a case where you can use a regular expression. I trust someone will publish such an answer for you very shortly.

However, just for the sake of completeness, I want to show that a loop-based approach isn't complicated at all, but rather straightforward:

function ExtractContent(const S: string): string;
var
  i, c: Integer;
  InBracket: Boolean;
begin
  SetLength(Result, S.Length);
  InBracket := False;
  c := 0;
  for i := 1 to S.Length do
  begin
    if S[i] = '{' then
      InBracket := True
    else if S[i]= '}' then
      InBracket := False
    else if not InBracket then
    begin
      Inc(c);
      Result[c] := S[i];
    end;
  end;
  SetLength(Result, c);
end;

Notice that I avoid unnecessary heap allocations.

(Personally, I have never been a huge fan of regular expressions. To me, the correctness of the above algorithm is obvious, it can only be interpreted in one way, and it is clearly written in a performant way. A regex, on the other hand, is a bit more like "magic". But I am a bit of a dinosaur, I admit that.)

Sign up to request clarification or add additional context in comments.

3 Comments

On corrupted input (no closing }) the regex will leave the string as is with the opening {, while your code will silently omit anything afterwards. OP must decide on his own which outcome is preferred.
@AmigoJack: Very true. I also remove a stray } between the "tags" and don't support nested tags. Unfortunately, the Q doesn't contain a full specification, so we don't know what the desired behaviour is. In any case, all these things can easily be changed by minor adjustments in the loop. I suspect the regex can be adjusted as well, so differences in behaviour are not indications of inherent limitations in either approach.
I prefer this kind of approach to regexp, too so +1. If I did it myself, though, I'd use a pchar indexer and omit the bool flag and use two mutually exclusive (accept and reject) loops.
3

Looks like you want a sort of regular expression, which Delphi fortunately offers in their RTL.

s := TRegEx.Replace('{color:black}😊{color}{color:black}{color}', '{.*?}', '', []);

or using the memo:

MemoComments.Lines.Text := TRegEx.Replace(MemoComments.Lines.Text, '{.*?}', '', []);

In this expression, {.*?}, .*? means any number (*) of any character (.), but as few as possible to match the rest of the expression (*?). That last bit is very powerful. By default, regexes are 'greedy', which means that .* would just match as many characters as possible, so it would take everything up to the last }, including the smiley and all the other color codes in between.

Pitfalls/cons

Like Andreas, I'm not a huge fan of regular expressions either. The awkward syntax can be hard to decypher, especially if you don't use them a lot.

Also, a seemingly simple regex can be hard to execute making it actually very slow sometimes, especially when working with larger strings. I recently bumped into one that was so magical, it was stuck for minutes on verifying whether a string of about 1000 characters matched a certain pattern.

The used expression is actually an example of that. It will have to look forward after the .*? part, to check whether it can satisfy the rest of the expression already. If not, go back, take another character, and look forward again. For this expression that's not an issue, but if an expression has multiple parts of variable length, this can be a CPU intensive process!

My earlier version, {[^}]*} is, theoretically at least, more efficient, because instead of any character, it just matches all characters that are not a }. Easier to execute, but harder to read. In the answer above I went for readability over performance, but it's always something to keep in mind.

Note that my first version, \{[^\}]*\} looked even more convoluted. I was using \ to escape the brackets, since they also have a special meaning for grouping, but it doesn't seem necessary in this case.

Lastly, there are different regex dialects, which is not helpful either.

That said

Fortunately Delphi wraps the PCRE library, which is open source, highly optimized, well maintained, well documented, and implements the most commonly used dialect.

And for operations like this they can be brief and easy to write, fast enough to use, and if you use them more often, it also becomes easier to read and write them, especially if you use a tool like regex101.com, where you can try out and debug regexes.

3 Comments

Note that older versions of Delphi (pre 2010?) don't include these classes, but they are based on some open source (MPL) components by Jan Goyvaerts, available from regular-expressions.info/delphi.html
Thanks for the addition, @GerryColl! I have used those in the past, and I think they are even the base for the now included library. I must say I omitted it from the answer deliberately, since it was introduced in Delphi XE, 10 years ago and about as many major versions.
Thank you everyone for the amazing answers. Very well explained and I learned alot from both answers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.