Cannot remove `Null Characters` from a string

Question

I asked a similar question couple months ago. Thanks to Rob Kennedy I could load my whole text into a Richedit BUT I couldn't remove Null chars. I could load my text because I used Stream.

Now in this code:

var
  strm : TMemorystream;
  str  : UTF8string;
  ss   : TStringstream;

begin
  strm := tmemorystream.Create;

  try
    strm.LoadFromFile('C:\Text.txt');
    setstring(str,PAnsichar(strm.Memory),strm.Size);
    str := StringReplace(str, #0, '', [rfReplaceAll]);  //This line doesn't work at all
    ss  := tstringstream.Create(str);
    Richedit1.Lines.LoadFromStream(ss);
  finally
    strm.Free;
    ss.Free;
  end;
end;

I converted TMemorystream to string to remove Null Chars with StringReplace() and then converted it again to TStringstream to load it with Richedit.lines.LoadFromStream.

But my problem is that I can't remove Null Character using StringReplace(). I can replace other characters but not #0.

Is there any way to remove null charcters directly in TMemorystream and load it into a Richedit? How? If it's not possible or it's very complex, how can I remove them when I convert my text to string?

Thanks.

1) input one character 2) if it isn't NUL then output it, else discard it 3) go to step 1. — Free Consulting
– Free Consulting, Commented Sep 14, 2013 at 12:58
Why did you accept my answer before if it didn't work? Note that I've updated my answer to mention the shortcoming of StringReplace and link to another answer that accomplishes the same task. — Rob Kennedy
– Rob Kennedy, Commented Sep 14, 2013 at 15:23
The real question is - why does the file have nulls in it to begin with? A UTF-8 encoded text file should not have any nulls in it, so the file is likely not UTF-8 to begin with. And this code is horribly inefficient with all of the UTF8->UTF16->UTF8->UTF16 conversions. — Remy Lebeau
– Remy Lebeau, Commented Sep 14, 2013 at 16:02
@RobKennedy I accepted your answer because you solved my problem. Not because of removing null chars but because of using stream instead of loading file. — Sky
– Sky, Commented Sep 14, 2013 at 16:07
@Sky: A webpage has a charset associated with it, which is specified either in the HTTP Content-Type header or in the HTML itself via a <meta> tag. You have to use the correct charset when decoding the data to Unicode. You can't load the data using whatever charset you feel like it. You will lose data that way. — Remy Lebeau
– Remy Lebeau, Commented Sep 15, 2013 at 0:57

David Heffernan · Accepted Answer · 2013-09-14 13:40:16Z

10

Sertac's answer is accurate and you should accept it. If performance is important, and you have a large string with frequent instances of the null character then you should try to reduce the number of heap allocations. Here is how I would implement this:

function RemoveNull(const Input: string): string;
var
  OutputLen, Index: Integer;
  C: Char;
begin
  SetLength(Result, Length(Input));
  OutputLen := 0;
  for Index := 1 to Length(Input) do
  begin
    C := Input[Index];   
    if C <> #0 then
    begin
      inc(OutputLen);
      Result[OutputLen] := C;
    end;
  end;
  SetLength(Result, OutputLen);
end;

If you want to do it directly in the memory stream, then you can do it like this:

procedure RemoveNullFromMemoryStream(Stream: TMemoryStream);
var
  i: Integer;
  pIn, pOut: PByte;
begin
  pIn := Stream.Memory;
  pOut := pIn;
  for i := 0 to Stream.Size-1 do
  begin
    if pIn^ <> 0 then
    begin
      pOut^ := pIn^;
      inc(pOut);
    end;
    inc(pIn);
  end;
  Stream.SetSize(NativeUInt(pOut)-NativeUInt(Stream.Memory));
end;

edited Sep 14, 2013 at 13:40

answered Sep 14, 2013 at 13:24

David Heffernan

616k46 gold badges1.1k silver badges1.5k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sky Over a year ago

Okay. Accepted Sertac's answer :)

Sertac Akyuz · Accepted Answer · 2013-09-14 13:09:23Z

8

As far as I can see, all searching/replacing utilities, at one time or other, cast the input to a PChar, which '#0' is the termination character. Hence they never go past the string part that's before the first Null. You may need to devise your own mechanism. Just a quick example:

var
  i: Integer;
begin
  Assert(str <> '');
  i := 1;
  while i <= Length(str) do
    if str[i] = #0 then
      Delete(str, i, 1)
    else
      Inc(i);

Replacing in the stream would similarly involve testing each character and then adjusting the stream accordingly before moving on after you decide to delete one.

answered Sep 14, 2013 at 13:09

Sertac Akyuz

55k4 gold badges109 silver badges176 bronze badges

5 Comments

moskito-x Over a year ago

You could be in your answer, point out the "delete" should only be used with small amounts of data . compare delete vs others

Sertac Akyuz Over a year ago

@moskito - It would seem so, yes. IMO, though, explaining the reason and giving a simple example should suffice.

David Heffernan Over a year ago

@Sertac I think that ought to be the case. Sadly the reality is that people who don't know any better will blindly use whatever code you post. A disclaimer stating, "don't use this code in production since it thrashes the heap" would be good enough for me. Even then though I'm many folk would ignore the warning!

Sertac Akyuz Over a year ago

@David - I think I was mistaken that the poster does have any interest whatsoever about the reason of the behavior. All that's really asked seems to be write the code, in which case your answer is the only alternative in fact. I don't have any problem with your answer being selected, what bothered me was the fact that I'm not always able to make out what kind of an answer is required.

David Heffernan Over a year ago

I agree. Your answer explains why StringReplace behaves as it does. And as such it should be accepted. It really doesn't look as though the asked has any interest at all in understanding what's going on and that is the fundamental reason why this problem is still extant.

Collectives™ on Stack Overflow

Cannot remove `Null Characters` from a string

2 Answers 2

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related