1

I am trying to move some array elements (of string) to some other position.

When I use System.Move(), FastMM4 reports leaks.

Here is a small snippet to show the problem:

procedure TForm1.Button2Click(Sender: TObject);
type
  TArrayOfStr = array of string;
const
  Count1 = 100;
  String1 = 'some string ';  {space at end}
var
  Array1: TArrayOfStr;
  Index1: Integer;
begin
  SetLength(Array1, Count1);
  Index1 := 0;
  while Index1 < Count1 do begin
    Array1[Index1] := String1 + IntToStr(Index1);
    Inc(Index1);
  end;
  System.Move(Array1[0], Array1[3], 2 * SizeOf(string)); {move 2 cells from cell 0 to cell 3}
  ShowMessage(Array1[3]);
end;

It probably has something to do with SizeOf(String) but I don't know what.

Could someone help me make the leaks go away?

32
  • 3
    Why are you not simply writing Array1[3] := Array1[0]? Commented Mar 18, 2016 at 14:06
  • 2
    Problem is Move doesn't work. Doesn't account for ref counting. Commented Mar 18, 2016 at 14:12
  • 3
    @Adem My point is you're wasting effort being paranoid about potential performance problems before you've even confirmed that they are a problem. Do you realise how fast modern computers are? Use of managed types provides benefits that the record keeping is done automatically and resources are managed correctly. You can try to do the same yourself, but it will probably be slower and buggier. In the vast majority of cases, if you do have a performance problem it won't be solved by fighting Delphi's managed types. Commented Mar 18, 2016 at 14:41
  • 2
    @Adem, and therein lies the rub. It is the management of the ref counting and the cleanup that cannot be skipped, besides TStringList also uses a TArray<string> internally. Commented Mar 18, 2016 at 14:49
  • 2
    If you change the definition of move from system.move to Mythical move that works using unicorns then you have of course won the argument. The move you are in fact looking for however is the naive loop. Even using hard core assembly skills the naive loop can only be improved by very a low %. I know because I can picture the assembly of your unicorn move and count the cycles in my head. Commented Mar 18, 2016 at 15:25

5 Answers 5

4

Issues
The problem you are having has to do with the reference counting of the string.

  1. Leaks
    If there is already a string in the areas you're overwriting these strings will not get freed. These are the leaks you are reporting.

  2. Potential access violations
    You copy the string pointers, but you do not increase the reference count of the string. This will lead to access violations if the original strings ever get destroyed due to going out of scope. This is a very subtle bug and will bite you when you least expect it.

Best solution
It's far simpler to just let Delphi do the copying and then all internal bookkeeping will get done properly.

  {move 6 from cell 1 to cell 3}
  System.Move(Array1[0], Array1[3], 2 * SizeOf(string)); 
  //This does not increase the reference count for the string;
  //leading to problems at cleanup.

  Array1[3]:= Array1[0];
  Array1[4]:= Array1[1];  //in a loop obviously :-)
  //this increases the reference count of the string.

Note that Delphi does not copy the strings, it just copies the pointers and increases the ref counts as needed. It also frees any strings as needed.

Hack solution
You should manually clear the area first.
Using

for i:= start to finish do Array1[i]:= '';

The next part of this solution horrible hack is to manually increase the ref counts on the strings you've copied.
See: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats#Long_String_Types

procedure IncreaseRefCount(const str: string; HowMuch: integer = 1);
var
  Hack: PInteger;
begin
  Hack:= pointer(str);
  Dec(Hack,2); //get ref count
  Hack^:= AtomicIncrement(Hack^,HowMuch);
end;

  System.Move(Array1[0], Array1[3], 2 * SizeOf(string)); 
  IncreaseRefCount(Array1[3]);
  .... do this for every copied item.

Note that this hack is not completely thread safe if you get the strings from somewhere outside your array.

However if you are really really in need of speed it might be a solution to gain a wooping 2% in the performance of the copy.

Warning
Don't use this code to decrease ref counts manually, you'll run into thread-safety issues!

Need for speed
It is unlikely that simply coping a few strings leads to slowness.
There is no way to get out of the clean up issues if you insist on using managed strings.
On the other hand the overhead of the reference counting is really not that bad, so I suspect the reason for the slowness lies elsewhere; somewhere we can't see because you haven't told us your problem.

I suggest you ask a new question explaining what you're trying to do, why and where the slowness is hurting you.

Sign up to request clarification or add additional context in comments.

6 Comments

Actually failure to increase reference count doesn't lead to leaks. It's failure to decrease (and free if zero) the destination reference counters that lead to the leaks. The first only creates the possibility of access violations.
TStringList (which is not an array of string, but of a record) uses this in its Delete() function: "PPointer(@FList[FCount])^ := nil;" I have no idea what it means. I am not sure it could help, either.
It doesn't seem (to me) to offer any significant speed advantage, simply because I would have to iterate all the moved cells after the Move() operation. I need something that does it all in one go.
@CraigYoung, yes I was getting there.
@Adem They are right here, and they are telling you. A. you don't have a speed problem to begin with. B. this code cannot be speed by any serious %. C.Hacking it will lead to all sorts of subtle bugs.
|
3

The String type in Delphi is a managed type. Delphi keeps record of references and dereferences and automatically releases memory allocated to the string when it's no longer being referenced.

The reason for the leak is that you are bypassing Delphi's management of the string type. You are simply overwriting the pointer that references the 4th string in the array. (and for that matter the 5th as well because of 2 * ...) So you now have strings in memory that are no longer referenced. But Delphi doesn't realise this and cannot free the memory.

Solution: write Array1[3] := Array1[0];


Edit:

I've fully answered your question; giving you everything you need to: 1) Understand why you've got the memory leaks. 2) And make the leaks go away.

Yet you're not satisfied... In comments you've explained that you're trying to improve a phantom performance problem you've conjured up via an artificial benchmark.

  • It's been explained to you that the only reason Move is a little faster than normal string assignment is that: string assignment needs to do additional record keeping to prevent memory leaks and access violations.
  • If you insist on using Move, you'll need to do said record keeping yourself. (Johan has even demonstrated how.)
  • And then you complain that this will slow down your Move "solution".
  • Seriously, take your pick: If you want to use string, you can have it a little faster with AV's and memory leaks OR a little slower but behaving correctly. There's no magic wand waving that's going fix it for you.
  • You could choose to abandon the string type (and all the goodness it gives you). I.e. Use array of char instead. Of course, you'll have to do all the memory management yourself, and see how far multi-byte string copying helps you.

I still maintain that if you ask a new question demonstrating a specific performance problem you're trying to solve, you'll get much better feedback.

E.g. In comments you've mentioned that you're trying to improve the performance of TStringList.

I have previously encountered a performance problem with TStringList in older versions of Delphi. It's generally fine even working with hundreds of thousands of items. However, even with only 10,000 strings, CommaText was noticeably slow; and at 40,000 it was almost unbearable.

The solution didn't involve trying to bastardise reference counting: because that's not the reason it was slow. It was slow because the algorithm was a little naive and performed a huge number of incremental memory allocations. Writing a custom CommaText method solved it.

2 Comments

That is what I am already doing and it is slow. I would love a solution in asm (32bit and 64bit). I am hoping there exists one.
@Adem Wait, you asked how can you profile a solution you don't have. But how then do you know that "it is slow"? ...... If you have a benchmark for some code that is too slow, perhaps you'll be better off asking a new question about how to improve the performance of that code.
0

I'm adding a fundamentally different answer in which I present an option how you can do something you really and truly should not be doing at all.

Disclaimer: What I describe here is terrible advice and should not be done (but it would work). Feel free to learn from it, but don't use it.

As has already been discussed ad nauseam, your problem is that when you use Move you bypass reference counting.

  • The solution involves making "internal" reference counting unnecessary by working on the principal that no matter how many times a string is internally referenced, you'll hold only a single count to the string.
  • And you'll only remove that single count when you're certain you have no more "internal" references to the string.
  • Unfortunately, having abandoned internal reference counting, the only time you can be sure of this is when you've completely finished with your internal array.
  • At this point you must first manually clear the internal array.
  • Then reduce the reference count of all strings you had previously used by 1.
  • This implies you need a separate "master reference" to each string.

Warning
There are 2 immediate problems:

  • You need a second list to track the master reference count; wasting memory.
  • You cannot recover memory used by strings that are no longer referenced internally until you've finished with the internal array entirely because you've abandoned your ability to track internal reference counts.
    (Technically this is still a memory leak, albeit controlled. And FastMM won't report it provided you cleanup correctly.)

Without further ado, some sample code:

//NOTE: Deliberate use of fixed size array instead of dynamic to
//avoid further complications. See Yet Another Warning after code
//for explanation and resolution.
TStringArray100 = array[0..99] of string;
TBadStrings = class(TObject)
private
  FMasterStrings: TStrings;
  FInternalStrings: TStringArray100;
public
  ...
end;

constructor TBadStrings.Create()
begin
  FMasterStings := TStringList.Create;
end;

destructor TBadStrings.Destroy;
begin
  Clear;
  FMasterStrings.Free;
  inherited;
end;

procedure TBadStrings.Clear;
begin
  for I := 0 to 99 do
    Pointer(FInternalStrings[I]) := 0;

  //Should optimise to equivalent of
  //Move(0, FInternalStrings[0], 100 * SizeOf(String));

  //NOTE: Only now is it safe to clear the master list.
  FMasterStings.Clear;
end;

procedure TBadStrings.SetString(APos: Integer; AString: string);
begin
  FMasterStrings.Add(AString); //Hold single reference count

  //Bypass reference counting to assign the string internally
  //Equivalent to Move
  Pointer(FInternalStrings[APos]) := Pointer(AString);
end;

//Usage
begin
  LBadStrings := TBadStrings.Create;
  try
    for I := 0 to 199 do
    begin
      //NOTE 0 to 99 are set, then all overwritten with 100 to 199
      //However strings 0 to 99 are still in memory...
      LBadStrings.SetString(I mod 100, 'String ' + IntToStr(I));
    end;
  finally
    //...until cleanup.
    LBadStrings.Free;
  end;
end;

NOTE: You can add methods to do whatever you like using Move on FInternalStrings. It won't matter that those references aren't being tracked because the master reference can perform correct cleanup at the end. However....

WARNING: Anything you do to FInternalStrings MUST also bypass reference counting otherwise you'll have nasty side-effects. So it should go without saying you need to solidly guard access to the internal array. If client code gets direct access to the array, you can expect accidental 'abuse'.

Yet Another Warning: As commented in code this uses a fixed size array to avoid other problems. Said problems are that if you use a dynamic array, then resizing the array can apply reference counting. Increasing the array size shouldn't be a problem (I recall that being a pointer copy). However, when the size is decreased, elements that are discarded will be dereferenced as necessary. This means you'll have to take the precaution of first pointer nilling these elements before shrinking the array.

The above is a way you can bypass string reference counting in a controlled fashion. But let me reiterate what a terrible idea it is.

You should have no trouble concocting an artificial benchmark to demonstrate that it's faster. However, I seriously doubt it will provide any benefit in a real-world environment.
In fact, if it does you probably have a different problem entirely; because why on earth would be shuffling the strings around so much that time spent there overshadows other aspects of your application?

Comments

0
procedure  TamListVar<T>.Insert(aIndex: Integer; aItem: T);
begin
    InitCheck;
    if not  IsIndex(aIndex) then  Add(aItem)
    else
    begin
        Setlength(Items,FCount+1);
        System.Move(Items[aIndex], Items[aIndex + 1],  (FCount - aIndex) * SizeOf(Items[aIndex]));
        PPointer(@Items[aIndex])^:=nil;
        Items[aIndex]:= aItem;
        inc(FCount);
    end;
end;

1 Comment

Code dumps without any explanation are rarely helpful. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please edit your question and explain how it works better than what the OP provided.
-2

Here is the solution I have come up with. It is heavily inspired by System.Move().

As far as I could see, after doing a number of tests, it seems to work OK --no leaks reported by FastMM4.

Obvioulsy, this is not the hand-optimized asm routine I was after; but given my (lack of) talents in asm area, this has to do for now.

I'd be most appreciative if you commented on this --especially to point out any pitfalls, as well as any other (e.g. speed) improvements.

{ACount refers to the number of actual array elements (cells of strings), not their byte count.}
procedure MoveString(const ASource; var ATarget; ACount: NativeInt);
type
  PString = ^string;
const
  SzString = SizeOf(string);
var
  Source1: PString;
  Target1: PString;
begin
  Source1 := PString(@ASource);
  Target1 := PString(@ATarget);
  if Source1 = Target1 then Exit;
  while ACount > 0 do begin
    Target1^ := Source1^;
    //Source1^ := ''; {enable if you want to avoid duplicates}
    Inc(Source1);
    Inc(Target1);
    Dec(ACount);
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
type
  TArrayOfStr = array of string;
const
  Count1 = 100;
  String1 = 'some string ';  {space at end}
var
  Array1: TArrayOfStr;
  Index1: Integer;
begin
  SetLength(Array1, Count1);
  Index1 := 0;
  while Index1 < Count1 do begin
    Array1[Index1] := String1 + IntToStr(Index1);
    Inc(Index1);
  end;
  MoveString(Array1[0], Array1[3], 2); {move 2 cells from cell 0 to cell 3}
  ShowMessage(Array1[3]); {should be 'some string 0'}
  MoveString(Array1[3], Array1[0], 2); {move 2 cells from cell 3 to cell 0}
  ShowMessage(Array1[0]); {should be 'some string 0'}
end;

15 Comments

This is what you so vehemently stated that you wished to avoid, only poorly and inefficiently implemented.
First, you've changed the semantics of Move because your line Source1^ := ''; clears the source reference; something that does not happen in System.Move. Second, you're failing to consider range overlaps (though maybe that's not too serious). But finally, most importantly and most ridiculously: your code is simply a convoluted way of writing string(ATarget) := string(ASource). You're just hiding it behind obfuscated pointer references and redundant temporary variables.
Your answer boils down to doing what you were advised in the first place. But in such a way that it looks different just so you can claim you weren't wrong. (And ironically making it more wrong in the process.)
To put some figures on your solution contra a simple assignment loop as suggested by @Craig : 10 million lines moved from one array to another one, your MoveString() 550ms, Craigs assignment 340ms. In the simple assignment I also had an assignment to '' of the source string as you have in your method.
@CraigYoung, You're right that System.Move() does not clear the source but, then, what it is doing is not 'move' (in literal sense) but 'copy'. I suppose it wouldn't matter if you resize the source array so that all the source cells are removed in that operation, but, in every other case, it means you've just created duplicates --which is not always desirable. I wanted to avoid that. Regarding your other comment: How did I make it "more wrong"?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.