Delphi extract numbers from string

Question

I have a variety of strings which I need to work with, these contain both letters and numbers , I am trying to extract the numbers (which is the part I need) from the string, the strings would have a similar format to -

the cat can count 123 567 so can the dog"

The length and position of the numbers can vary from 12 34 123 456 1234 5678 11111 11111

Also the number seperator can vary from a space question mark and also a dash 12-34 12.34 So the string could be EG “the cat can't count, the dog can 12-67” or “the cat can count 1234.5678 so can the dog” Is there any clever way in Delphi I can extract the numbers? Or would I have to do it by scanning the string in code.

Any help would be appreciated

Thanks

colin

How about RegExp? Not really clever but very universal. Note: your description is quite informal, so probably noone will advise anything better. — OnTheFly
– OnTheFly, Commented Jan 11, 2012 at 17:02

Leonardo Herrera · Accepted Answer · 2013-10-19 13:38:43Z

15

If you have Delphi XE or up, you can use regular expressions. This is completely untested, based on David Heffernan's answer:

function ExtractNumbers(const s: string): TArray<string>;
var
    regex: TRegEx;
    match: TMatch;
    matches: TMatchCollection;
    i: Integer;
begin
    Result := nil;
    i := 0;
    regex := TRegEx.Create("\d+");
    matches := regex.Matches(s);
    if matches.Count > 0 then
    begin
        SetLength(Result, matches.Count);
        for match in matches do
        begin
            Result[i] := match.Value;
            Inc(i);
        end;
    end;
end;

edited Oct 19, 2013 at 13:38

answered Jan 11, 2012 at 19:52

Leonardo Herrera

8,4065 gold badges40 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Heffernan Over a year ago

+1 I prefer this to my version. That said if Matches.Count does not compile.

David Heffernan · Accepted Answer · 2012-01-11 17:19:12Z

9

I think this function is what you are looking for:

function ExtractNumbers(const s: string): TArray<string>;
var
  i, ItemIndex: Integer;
  LastCharWasDigit: Boolean;
  len: Integer;
  Count: Integer;
  Start: Integer;
begin
  len := Length(s);
  if len=0 then begin
    Result := nil;
    exit;
  end;

  Count := 0;
  LastCharWasDigit := False;
  for i := 1 to len do begin
    if TCharacter.IsDigit(s[i]) then begin
      LastCharWasDigit := True;
    end else if LastCharWasDigit then begin
      inc(Count);
      LastCharWasDigit := False;
    end;
  end;
  if LastCharWasDigit then begin
    inc(Count);
  end;

  SetLength(Result, Count);
  ItemIndex := 0;
  Start := 0;
  for i := 1 to len do begin
    if TCharacter.IsDigit(s[i]) then begin
      if Start=0 then begin
        Start := i;
      end;
    end else begin
      if Start<>0 then begin
        Result[ItemIndex] := Copy(s, Start, i-Start);
        inc(ItemIndex);
        Start := 0;
      end;
    end;
  end;
  if Start<>0 then begin
    Result[ItemIndex] := Copy(s, Start, len);
  end;
end;

answered Jan 11, 2012 at 17:19

David Heffernan

616k46 gold badges1.1k silver badges1.5k bronze badges

3 Comments

Marjan Venema Over a year ago

+1 nice way of returning all individual numeric parts. Would hate to have to add support for returning 123.12 as a single numeric part though, especially when taking a.123, a. .123 and 123. 456 or 123 .456 etc. into consideration. :)

David Heffernan Over a year ago

@Marjan And there's always negative numbers, but OP does appear to regard any non-digit as a sep.

David Heffernan Over a year ago

@SHINJaeGuk This happened in the 7 years that have elapsed since the answer was written. Now you use the record helper for Char and write s[i].IsDigit.

HAMMOU REDOUANE · Accepted Answer · 2018-09-03 06:34:43Z

9

function ExtractNumberInString ( sChaine: String ): String ;
var
    i: Integer ;
begin
    Result := '' ;
    for i := 1 to length( sChaine ) do
    begin
        if sChaine[ i ] in ['0'..'9'] then
        Result := Result + sChaine[ i ] ;
    end ;
end ;

edited Sep 3, 2018 at 6:34

HAMMOU REDOUANE

134 bronze badges

answered Jul 21, 2018 at 15:58

user2029909

1011 silver badge3 bronze badges

1 Comment

LU RD Over a year ago

There can be multiple numbers in the string, so this answer does not answer the question.

SpeedResolve · Accepted Answer · 2021-07-22 06:27:40Z

EDIT 1.: The function below will read the first floating point number after the position Start in the String S and register its end position as Last. This function does NOT work for different cases of float numbers, such as:

3.1415926535897932384d0; A double-format approximation to Pi
3.010299957f-1; Log2, in single format
-0.000000001s9 e^(i*Pi), in short format
0.0s0; A floating-point zero in short format
0s0; Also a floating-point zero in short format

https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node19.html

function ExtractFloatStr(Start : integer; S : string; var Last : integer) : string;
var
  i, Lstr : integer;
  chln : char;
  str_acm : string;
  Numeric, Sign, DeciSep, Exponent : boolean;
begin
  Numeric := False;  Sign := False;  DeciSep := False;  Exponent := False;
  Lstr := length(S);

  If (Start > 0) and (Start <= Lstr) then
    i := Start-1
  Else i := 0;

  Last := -1;  chln := #0;
  str_acm := '';
  repeat
  begin
    i := i + 1;
    chln := S[i];
    //ShowMessage('str_acm['+IntToStr(i)+'] = '+str_acm+'P');
    If Last = -1 then
    begin

      If chln = '-' then
      begin
        { Signs will only count if they are the first element }
        If (str_acm = '') then    { Signs can only be added at the leftmost position }
        begin
          Sign := True;
          str_acm := str_acm + chln;
        end
        { If there's something already registered as number, a right-side Sign will mean two things }
        Else begin
          { Signs cannot be added at the right side of any number or Decimal Separator }
          If Numeric = True then   { End of the reading, in case there's already a valid series of digits }
          begin
            {Last := i-1;}              { ex.: -1.20----; -.20--- }
            If i > 1 then
            begin
              If (S[i-1] = 'E') or (S[i-1] = 'e') then
                str_acm := str_acm + chln
              Else begin
                Last := i-1;
              end;
            end;
          end
          Else begin               { A mixture of various characters without numeric logic}
            str_acm := '';         { So start over the reading }
            Sign := False;         { ex.: -.--- }
          end;
        end;
      end;
      If (chln in ['.',',']) then
      begin
        If (DeciSep = False) then        { Decimal Separators can only be added once }
        begin
          str_acm := str_acm + DecimalSeparator;
          DeciSep := True;
        end
        { If a Decimal Separator was already accounted, a second one will mean two things }
        Else begin
          If Numeric = True then   { End of the reading, in case there's already a valid series of digits }
            Last := i-1              { ex.: -1.20...; -0.20. }
          Else begin               { A mixture of various characters without numeric logic }
            str_acm := '';         { So start over the reading }
            DeciSep := False;      { ex.: -... }
          end;
        end;
      end;

      If (chln in ['0'..'9']) then
      begin
        Numeric := True;            { Numbers can be added after any other characters, be it Sign and/or Decimal Separator }
        str_acm := str_acm + chln;  { Ex.: -1; -2.1; -.1; -1. }
      end;

      If (chln = 'E') or (chln = 'e') then
      begin
        If Exponent = False then
        begin
          If Numeric = True then    { E for the power of 10 can only be added once and after a series of digits }
          begin                     { Ex.: 1.0E10; -.0E2; -4.E3 }
            str_acm := str_acm + chln;
            Exponent := True;
          end
          Else begin                { The abscense of a previous series of digits does not allow the insertion of E }
            str_acm := '';          { E cannot start a floating point number and cannot succeed a sign or }
          end;                      { decimal separator if there isn't any previous number }
        end                         { Ex.: -.E; .E; -E; E }
        Else begin
          Last := i-1;              { E cannot appear twice. A second one means the end of the reading }
        end;
      end;

      If chln = '+' then            { Plus (+) sign will only be registered after a valid exponential E character }
      begin
        If (i > 1) and (Exponent = True) then
        begin
          If (S[i-1] = 'E') or (S[i-1] = 'e') then
            str_acm := str_acm + chln
          Else begin                
            Last := i-1;            { If it's added after anything other than E, the reading ends }
          end;
        end;
        If Exponent = False then
        begin
          If (Numeric = True) then
          begin
            Last := i-1;            { If it's added after anything other than E, the reading ends }
          end
          Else begin
            str_acm := '';          { If it's added after anything other than E, and if there isn't any }
            Exponent := False;      { valid series of digits, the reading restarts }
          end;
        end;
      end;

      { If any character except the ones from the Floating Point System are added }
      If not (chln in ['0'..'9','-','+',',','.','E','e']) then
      begin
        { After an already accounted valid series of digits }
        If (str_acm <> '') then
        begin
          If (Numeric = True) then
            Last := i-1             { End of the reading. Ex.: -1.20A; -.20%; 120# }
          Else begin
            str_acm := '';
            Sign := False;  DeciSep := False;  Exponent := False;
          end;
        end;
      end;
    end;
    //ShowMessage('i = '+IntToStr(i)+#13+str_acm+'P');
  end;
  until((Last <> -1) or (i = Lstr));

  If (i = Lstr) and (Numeric = True) then
    Last := i;

  { The Loop does not filter the case when no number is inserted after E, E- or E+ }
  { So it's necessary to check and remove if E,E-,E+,e,e-,e+ are the last characters }
  If Last <> -1 then
  begin
    Lstr := length(str_acm);
    If (str_acm[Lstr] = '+') or (str_acm[Lstr] = '-') then
    begin
      SetLength(str_acm,Lstr-1);
      Last := Last - 1;
    end;

    Lstr := length(str_acm);
    If (str_acm[Lstr] = 'E') or (str_acm[Lstr] = 'e') then
    begin
      SetLength(str_acm,Lstr-1);
      Last := Last - 1;
    end;

    Result := str_acm;
  end
  Else Result := '';
end;  { ExtractFloatStr }

EDIT 2.: Another function using the previous one to read a series of numbers in the same string.

var
  TFloatType = real;
  TVetorN = array of TFloatType;

procedure ExtractFloatVectorStr(Str : string; var N : integer; var FloatVector : TVetorN);
var                                     { Extract floating point numbers from string reading from left to right }
  i, j, k, Lstr, Lstr1 : integer;       { Register the numbers in FloatVector as the type TVetorN }
  char1 : char;                         { Register the amount of numbers found as the integer N }
  str_acm : string;
begin
  Str := AdjustLineBreaks(Str,tlbsCRLF);
  Lstr := length(Str);
  Lstr1 := 0;
  char1 := #0;

  i := 1; j := 0; k := 0; str_acm := '';
  SetLength(FloatVector,j+1);
  repeat
  begin
    If (i <= Lstr) then
    begin
      str_acm := ExtractFloatStr(i, Str, k);
      Lstr1 := length(str_acm);
      If (Lstr1 > 0) and (str_acm <> '') then
      begin
        j := j + 1;
        SetLength(FloatVector,j+1);
        FloatVector[j] := StrToFloat(str_acm);
        i := k + 1;
      end
      Else i := i + 1;
    end;
  end;
  until(i > Lstr);
  N := j;
end;  { ExtractFloatVectorStr }

There can be multiple numbers in the string, so this answer does not answer the question. Also, all numbers are positive integers.
@LU RD I can post another function I built where a series of floating point numbers is recognized and listed in a dynamic array. The function above will identify the first floating point number from the position Start on, and mark its end at Last, giving a string compatible with a subsequent StrToFloat input, if the string is different from nil.
The question is not about floating point numbers. Anything but 0..9 should be treated as space.
Well, you are not wrong, but I think the code I gave may help in that regard. I searched for similar subjects here in stack and didn't find this problem in Delphi. So I made it myself. It may not be the optimal place to share it, but there you go

Jean-Philippe · Accepted Answer · 2021-09-07 10:09:51Z

1

inspired by user2029909 response

function ExtractNumberInString (sChaine: String; Start : Integer = 1): TArray<String> ;
var
  i, j: Integer ;
  TmpStr : string;
begin
  j := 0;
  for i := Start to Length( sChaine ) do
    begin
      if sChaine[ i ] in ['0'..'9'] then
        TmpStr := TmpStr + sChaine[ i ]
      else
        if TmpStr <> '' then
          begin
            SetLength(Result, Length(Result) + 1);
            Result[j] := TmpStr;
            TmpStr := '';
            Inc(j);
          end;
    end ;
end ;

answered Sep 7, 2021 at 10:09

Jean-Philippe

1731 silver badge11 bronze badges

2 Comments

Akin Okegbile Over a year ago

Can you add more commentary on this code?

Community Over a year ago

Please add further details to expand on your answer, such as working code or documentation citations.

Collectives™ on Stack Overflow

Delphi extract numbers from string

5 Answers 5

1 Comment

3 Comments

1 Comment

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

3 Comments

1 Comment

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related