1

I have a file with very big num like below

45313904626416486480179546360796323469287116537171
465573254230695450538671922463236910370073247307526
5906233480284069039032926795367974774430427486375

How to sort this kind of num ?

The result should be something like (real file is 100000 lines):

5906233480284069039032926795367974774430427486375
45313904626416486480179546360796323469287116537171
465573254230695450538671922463236910370073247307526

I try something with

  MyFlexibleArray := TList<UInt64>.Create;
  AssignFile(F, OpenTextFileDialog1.FileName);
  Reset(F);
  repeat
    Readln(F, str);
    MyFlexibleArray.Add(UInt64(str));
  until EOF(F);
  CloseFile(F);
  MyFlexibleArray.Sort;

With a TStringList, the result wasn't sort in natural way!

Any help will be greatly appreciated.

full file

3
  • The max UInt64 is 18,446,744,073,709,551,615 which is a fair bit smaller than your numbers. You could add zeroes in front of the numbers so they all line up, string sort them and output with the leading zeros removed. Commented Mar 22, 2021 at 15:18
  • Who'd upload a textfile of that size uncompressed to a host where it will vanish within a few days? And it contains lines like 5079598143104014379457667649331467875308351922_8 with _ being a NUL byte. Whoever will need this file again in the future reply to me. Commented Mar 22, 2021 at 21:38
  • In case you need to do more (math) then just sorting these big numbers, there is a lib for this: github.com/rvelthuis/DelphiBigNumbers Commented Mar 23, 2021 at 5:43

3 Answers 3

4

First, your text file is corrupt. It contains NUL bytes, which makes it impossible to parse it normally.

However, if we disregard this issue, sorting a file like this is almost trivial.

Assuming there are no leading zeros, the following algorithm will give the correct result:

var Data := TFile.ReadAllLines('K:\numbers.txt', TEncoding.ASCII);

TArray.Sort<string>(
  Data,
  TComparer<string>.Construct(
    function(const L, R: string): Integer
    begin
      Result := CompareValue(L.Length, R.Length);
      if Result <> 0 then
        Exit;
      for var i := 1 to L.Length do
      begin
        Result := CompareValue(Ord(L[i]), Ord(R[i]));
        if Result <> 0 then
          Exit;
      end;
    end
  )
);

TFile.WriteAllLines('K:\sorted.txt', Data, TEncoding.ASCII);

We construct our own string comparer according to these rules:

  • If L has more (fewer) digits than R, then clearly it is greater (smaller).
  • If L and R has the same number of digits, compare the digits, one by one, from the MSD to the LSD.

Just add IOUtils, Generics.Defaults, Generics.Collections, and Math to your uses clause.

Sign up to request clarification or add additional context in comments.

3 Comments

@NomiLaura How long does it take?
@Olivier: On my system, the sorting takes 3 seconds for a 1M line file. I imagine most routines will be close to this on my system. CompareStr and CompareText are highly optimized and yield a slightly better runtime (about 2.3 seconds); specifically, it seems like they compare one 32-bit int at a time even though a char is only 16 bits.
Just more than 1s with 100000 lines. 😀 The extra time provide from TMemo. Sorry
2

If you sort your data as a string, the length of the string is not being taken into account.

If you use a Generic (so you will need System.Generics.Collections in your uses clause) you can specify how to compare objects in a parameter to the constructor. This means that your list of strings would be declared as:

  FMyStrings: TList<String>;

You comparator would compare two strings, if you assume that the strings can only ever contain decimal digits then your comparator would be something like:

  TMyStringSorter = class(TComparer<String>)
  public
    function Compare(const Left, Right: String): Integer; override;
  end;

  function TMyStringSorter.Compare(const Left, Right: String): Integer;
  begin
    if(Length(Left)<Length(Right) then Result:=-1
    else if(Length(Right)<Length(Left) then Result:=1
    else Result:=CompareStr(Left, Right);
  end;

Then pass the Interface to the comparer to the TList Constructor and you can sort it according to your own sort algorithm.

3 Comments

If you don't need to compare numbers case-sensitively (!), you can probably make it a tiny bit faster by using CompareStr instead of CompareText.
@AndreasRejbrand - yes you are right of course (+1). CompareText is case-insensitive, but CompareStr would probably be a bit faster.
This is essentially the same solution as in my answer, except for that this one uses generics. It is possible to achieve the same without generics, using a standard TStringList (like in my answer), by using CustomSort and your own comparer function.
0

You could keep the big numbers in strings in a TStringList, and sort it using a custom comparer that first sorts by string length and then (if length is equal) by string value. Like this:

function NumberStringComparer(List: TStringList; Index1, Index2: Integer): Integer;
var
  Value1, Value2: string;
  Len1, Len2: Integer;
begin
  Value1 := List[Index1];
  Value2 := List[Index2];
  Len1 := Length(Value1);
  Len2 := Length(Value2);

  Result := Len1 - Len2;
  if Result = 0 then
  begin
    if Value1 = Value2 then
      Result := 0
    else if Value1 > Value2 then
      Result := 1
    else
      Result := -1;
  end;
end;

Usage example:

MyStringList.CustomSort(NumberStringComparer);

2 Comments

with numbers like ``` 10 2 7 1 3 4 5 8 7 6 ``` the sort is ``` 1 10 2 3 4 5 6 7 7 8 ```
That is not possible with this comparer. All strings with length 1 come before all strings with length 2. The sort is 1 2 3 4 5 6 7 7 8 10. The output you mention is what TStringList's default sort would give.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.