3

I have a variety of strings which I need to work with, these contain both letters and numbers , I am trying to extract the numbers (which is the part I need) from the string, the strings would have a similar format to -

the cat can count 123 567 so can the dog"

The length and position of the numbers can vary from 12 34 123 456 1234 5678 11111 11111

Also the number seperator can vary from a space question mark and also a dash 12-34 12.34 So the string could be EG “the cat can't count, the dog can 12-67” or “the cat can count 1234.5678 so can the dog” Is there any clever way in Delphi I can extract the numbers? Or would I have to do it by scanning the string in code.

Any help would be appreciated

Thanks

colin

1
  • How about RegExp? Not really clever but very universal. Note: your description is quite informal, so probably noone will advise anything better. Commented Jan 11, 2012 at 17:02

5 Answers 5

15

If you have Delphi XE or up, you can use regular expressions. This is completely untested, based on David Heffernan's answer:

function ExtractNumbers(const s: string): TArray<string>;
var
    regex: TRegEx;
    match: TMatch;
    matches: TMatchCollection;
    i: Integer;
begin
    Result := nil;
    i := 0;
    regex := TRegEx.Create("\d+");
    matches := regex.Matches(s);
    if matches.Count > 0 then
    begin
        SetLength(Result, matches.Count);
        for match in matches do
        begin
            Result[i] := match.Value;
            Inc(i);
        end;
    end;
end;
Sign up to request clarification or add additional context in comments.

1 Comment

+1 I prefer this to my version. That said if Matches.Count does not compile.
9

I think this function is what you are looking for:

function ExtractNumbers(const s: string): TArray<string>;
var
  i, ItemIndex: Integer;
  LastCharWasDigit: Boolean;
  len: Integer;
  Count: Integer;
  Start: Integer;
begin
  len := Length(s);
  if len=0 then begin
    Result := nil;
    exit;
  end;

  Count := 0;
  LastCharWasDigit := False;
  for i := 1 to len do begin
    if TCharacter.IsDigit(s[i]) then begin
      LastCharWasDigit := True;
    end else if LastCharWasDigit then begin
      inc(Count);
      LastCharWasDigit := False;
    end;
  end;
  if LastCharWasDigit then begin
    inc(Count);
  end;

  SetLength(Result, Count);
  ItemIndex := 0;
  Start := 0;
  for i := 1 to len do begin
    if TCharacter.IsDigit(s[i]) then begin
      if Start=0 then begin
        Start := i;
      end;
    end else begin
      if Start<>0 then begin
        Result[ItemIndex] := Copy(s, Start, i-Start);
        inc(ItemIndex);
        Start := 0;
      end;
    end;
  end;
  if Start<>0 then begin
    Result[ItemIndex] := Copy(s, Start, len);
  end;
end;

3 Comments

+1 nice way of returning all individual numeric parts. Would hate to have to add support for returning 123.12 as a single numeric part though, especially when taking a.123, a. .123 and 123. 456 or 123 .456 etc. into consideration. :)
@Marjan And there's always negative numbers, but OP does appear to regard any non-digit as a sep.
@SHINJaeGuk This happened in the 7 years that have elapsed since the answer was written. Now you use the record helper for Char and write s[i].IsDigit.
9
function ExtractNumberInString ( sChaine: String ): String ;
var
    i: Integer ;
begin
    Result := '' ;
    for i := 1 to length( sChaine ) do
    begin
        if sChaine[ i ] in ['0'..'9'] then
        Result := Result + sChaine[ i ] ;
    end ;
end ;

1 Comment

There can be multiple numbers in the string, so this answer does not answer the question.
1

EDIT 1.: The function below will read the first floating point number after the position Start in the String S and register its end position as Last. This function does NOT work for different cases of float numbers, such as:

  • 3.1415926535897932384d0; A double-format approximation to Pi

  • 3.010299957f-1; Log2, in single format

  • -0.000000001s9 e^(i*Pi), in short format

  • 0.0s0; A floating-point zero in short format

  • 0s0; Also a floating-point zero in short format

https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node19.html

function ExtractFloatStr(Start : integer; S : string; var Last : integer) : string;
var
  i, Lstr : integer;
  chln : char;
  str_acm : string;
  Numeric, Sign, DeciSep, Exponent : boolean;
begin
  Numeric := False;  Sign := False;  DeciSep := False;  Exponent := False;
  Lstr := length(S);

  If (Start > 0) and (Start <= Lstr) then
    i := Start-1
  Else i := 0;

  Last := -1;  chln := #0;
  str_acm := '';
  repeat
  begin
    i := i + 1;
    chln := S[i];
    //ShowMessage('str_acm['+IntToStr(i)+'] = '+str_acm+'P');
    If Last = -1 then
    begin

      If chln = '-' then
      begin
        { Signs will only count if they are the first element }
        If (str_acm = '') then    { Signs can only be added at the leftmost position }
        begin
          Sign := True;
          str_acm := str_acm + chln;
        end
        { If there's something already registered as number, a right-side Sign will mean two things }
        Else begin
          { Signs cannot be added at the right side of any number or Decimal Separator }
          If Numeric = True then   { End of the reading, in case there's already a valid series of digits }
          begin
            {Last := i-1;}              { ex.: -1.20----; -.20--- }
            If i > 1 then
            begin
              If (S[i-1] = 'E') or (S[i-1] = 'e') then
                str_acm := str_acm + chln
              Else begin
                Last := i-1;
              end;
            end;
          end
          Else begin               { A mixture of various characters without numeric logic}
            str_acm := '';         { So start over the reading }
            Sign := False;         { ex.: -.--- }
          end;
        end;
      end;
      If (chln in ['.',',']) then
      begin
        If (DeciSep = False) then        { Decimal Separators can only be added once }
        begin
          str_acm := str_acm + DecimalSeparator;
          DeciSep := True;
        end
        { If a Decimal Separator was already accounted, a second one will mean two things }
        Else begin
          If Numeric = True then   { End of the reading, in case there's already a valid series of digits }
            Last := i-1              { ex.: -1.20...; -0.20. }
          Else begin               { A mixture of various characters without numeric logic }
            str_acm := '';         { So start over the reading }
            DeciSep := False;      { ex.: -... }
          end;
        end;
      end;

      If (chln in ['0'..'9']) then
      begin
        Numeric := True;            { Numbers can be added after any other characters, be it Sign and/or Decimal Separator }
        str_acm := str_acm + chln;  { Ex.: -1; -2.1; -.1; -1. }
      end;

      If (chln = 'E') or (chln = 'e') then
      begin
        If Exponent = False then
        begin
          If Numeric = True then    { E for the power of 10 can only be added once and after a series of digits }
          begin                     { Ex.: 1.0E10; -.0E2; -4.E3 }
            str_acm := str_acm + chln;
            Exponent := True;
          end
          Else begin                { The abscense of a previous series of digits does not allow the insertion of E }
            str_acm := '';          { E cannot start a floating point number and cannot succeed a sign or }
          end;                      { decimal separator if there isn't any previous number }
        end                         { Ex.: -.E; .E; -E; E }
        Else begin
          Last := i-1;              { E cannot appear twice. A second one means the end of the reading }
        end;
      end;

      If chln = '+' then            { Plus (+) sign will only be registered after a valid exponential E character }
      begin
        If (i > 1) and (Exponent = True) then
        begin
          If (S[i-1] = 'E') or (S[i-1] = 'e') then
            str_acm := str_acm + chln
          Else begin                
            Last := i-1;            { If it's added after anything other than E, the reading ends }
          end;
        end;
        If Exponent = False then
        begin
          If (Numeric = True) then
          begin
            Last := i-1;            { If it's added after anything other than E, the reading ends }
          end
          Else begin
            str_acm := '';          { If it's added after anything other than E, and if there isn't any }
            Exponent := False;      { valid series of digits, the reading restarts }
          end;
        end;
      end;

      { If any character except the ones from the Floating Point System are added }
      If not (chln in ['0'..'9','-','+',',','.','E','e']) then
      begin
        { After an already accounted valid series of digits }
        If (str_acm <> '') then
        begin
          If (Numeric = True) then
            Last := i-1             { End of the reading. Ex.: -1.20A; -.20%; 120# }
          Else begin
            str_acm := '';
            Sign := False;  DeciSep := False;  Exponent := False;
          end;
        end;
      end;
    end;
    //ShowMessage('i = '+IntToStr(i)+#13+str_acm+'P');
  end;
  until((Last <> -1) or (i = Lstr));

  If (i = Lstr) and (Numeric = True) then
    Last := i;

  { The Loop does not filter the case when no number is inserted after E, E- or E+ }
  { So it's necessary to check and remove if E,E-,E+,e,e-,e+ are the last characters }
  If Last <> -1 then
  begin
    Lstr := length(str_acm);
    If (str_acm[Lstr] = '+') or (str_acm[Lstr] = '-') then
    begin
      SetLength(str_acm,Lstr-1);
      Last := Last - 1;
    end;

    Lstr := length(str_acm);
    If (str_acm[Lstr] = 'E') or (str_acm[Lstr] = 'e') then
    begin
      SetLength(str_acm,Lstr-1);
      Last := Last - 1;
    end;

    Result := str_acm;
  end
  Else Result := '';
end;  { ExtractFloatStr }

EDIT 2.: Another function using the previous one to read a series of numbers in the same string.

var
  TFloatType = real;
  TVetorN = array of TFloatType;

procedure ExtractFloatVectorStr(Str : string; var N : integer; var FloatVector : TVetorN);
var                                     { Extract floating point numbers from string reading from left to right }
  i, j, k, Lstr, Lstr1 : integer;       { Register the numbers in FloatVector as the type TVetorN }
  char1 : char;                         { Register the amount of numbers found as the integer N }
  str_acm : string;
begin
  Str := AdjustLineBreaks(Str,tlbsCRLF);
  Lstr := length(Str);
  Lstr1 := 0;
  char1 := #0;

  i := 1; j := 0; k := 0; str_acm := '';
  SetLength(FloatVector,j+1);
  repeat
  begin
    If (i <= Lstr) then
    begin
      str_acm := ExtractFloatStr(i, Str, k);
      Lstr1 := length(str_acm);
      If (Lstr1 > 0) and (str_acm <> '') then
      begin
        j := j + 1;
        SetLength(FloatVector,j+1);
        FloatVector[j] := StrToFloat(str_acm);
        i := k + 1;
      end
      Else i := i + 1;
    end;
  end;
  until(i > Lstr);
  N := j;
end;  { ExtractFloatVectorStr }

4 Comments

There can be multiple numbers in the string, so this answer does not answer the question. Also, all numbers are positive integers.
@LU RD I can post another function I built where a series of floating point numbers is recognized and listed in a dynamic array. The function above will identify the first floating point number from the position Start on, and mark its end at Last, giving a string compatible with a subsequent StrToFloat input, if the string is different from nil.
The question is not about floating point numbers. Anything but 0..9 should be treated as space.
Well, you are not wrong, but I think the code I gave may help in that regard. I searched for similar subjects here in stack and didn't find this problem in Delphi. So I made it myself. It may not be the optimal place to share it, but there you go
1

inspired by user2029909 response

function ExtractNumberInString (sChaine: String; Start : Integer = 1): TArray<String> ;
var
  i, j: Integer ;
  TmpStr : string;
begin
  j := 0;
  for i := Start to Length( sChaine ) do
    begin
      if sChaine[ i ] in ['0'..'9'] then
        TmpStr := TmpStr + sChaine[ i ]
      else
        if TmpStr <> '' then
          begin
            SetLength(Result, Length(Result) + 1);
            Result[j] := TmpStr;
            TmpStr := '';
            Inc(j);
          end;
    end ;
end ;

2 Comments

Can you add more commentary on this code?
Please add further details to expand on your answer, such as working code or documentation citations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.