4

I asked a similar question couple months ago. Thanks to Rob Kennedy I could load my whole text into a Richedit BUT I couldn't remove Null chars. I could load my text because I used Stream.


Now in this code:

var
  strm : TMemorystream;
  str  : UTF8string;
  ss   : TStringstream;

begin
  strm := tmemorystream.Create;

  try
    strm.LoadFromFile('C:\Text.txt');
    setstring(str,PAnsichar(strm.Memory),strm.Size);
    str := StringReplace(str, #0, '', [rfReplaceAll]);  //This line doesn't work at all
    ss  := tstringstream.Create(str);
    Richedit1.Lines.LoadFromStream(ss);
  finally
    strm.Free;
    ss.Free;
  end;
end;

I converted TMemorystream to string to remove Null Chars with StringReplace() and then converted it again to TStringstream to load it with Richedit.lines.LoadFromStream.

But my problem is that I can't remove Null Character using StringReplace(). I can replace other characters but not #0.

Is there any way to remove null charcters directly in TMemorystream and load it into a Richedit? How? If it's not possible or it's very complex, how can I remove them when I convert my text to string?

Thanks.

9
  • 1
    1) input one character 2) if it isn't NUL then output it, else discard it 3) go to step 1. Commented Sep 14, 2013 at 12:58
  • 1
    Why did you accept my answer before if it didn't work? Note that I've updated my answer to mention the shortcoming of StringReplace and link to another answer that accomplishes the same task. Commented Sep 14, 2013 at 15:23
  • 3
    The real question is - why does the file have nulls in it to begin with? A UTF-8 encoded text file should not have any nulls in it, so the file is likely not UTF-8 to begin with. And this code is horribly inefficient with all of the UTF8->UTF16->UTF8->UTF16 conversions. Commented Sep 14, 2013 at 16:02
  • @RobKennedy I accepted your answer because you solved my problem. Not because of removing null chars but because of using stream instead of loading file. Commented Sep 14, 2013 at 16:07
  • 2
    @Sky: A webpage has a charset associated with it, which is specified either in the HTTP Content-Type header or in the HTML itself via a <meta> tag. You have to use the correct charset when decoding the data to Unicode. You can't load the data using whatever charset you feel like it. You will lose data that way. Commented Sep 15, 2013 at 0:57

2 Answers 2

10

Sertac's answer is accurate and you should accept it. If performance is important, and you have a large string with frequent instances of the null character then you should try to reduce the number of heap allocations. Here is how I would implement this:

function RemoveNull(const Input: string): string;
var
  OutputLen, Index: Integer;
  C: Char;
begin
  SetLength(Result, Length(Input));
  OutputLen := 0;
  for Index := 1 to Length(Input) do
  begin
    C := Input[Index];   
    if C <> #0 then
    begin
      inc(OutputLen);
      Result[OutputLen] := C;
    end;
  end;
  SetLength(Result, OutputLen);
end;

If you want to do it directly in the memory stream, then you can do it like this:

procedure RemoveNullFromMemoryStream(Stream: TMemoryStream);
var
  i: Integer;
  pIn, pOut: PByte;
begin
  pIn := Stream.Memory;
  pOut := pIn;
  for i := 0 to Stream.Size-1 do
  begin
    if pIn^ <> 0 then
    begin
      pOut^ := pIn^;
      inc(pOut);
    end;
    inc(pIn);
  end;
  Stream.SetSize(NativeUInt(pOut)-NativeUInt(Stream.Memory));
end;
Sign up to request clarification or add additional context in comments.

1 Comment

Okay. Accepted Sertac's answer :)
8

As far as I can see, all searching/replacing utilities, at one time or other, cast the input to a PChar, which '#0' is the termination character. Hence they never go past the string part that's before the first Null. You may need to devise your own mechanism. Just a quick example:

var
  i: Integer;
begin
  Assert(str <> '');
  i := 1;
  while i <= Length(str) do
    if str[i] = #0 then
      Delete(str, i, 1)
    else
      Inc(i);

Replacing in the stream would similarly involve testing each character and then adjusting the stream accordingly before moving on after you decide to delete one.

5 Comments

You could be in your answer, point out the "delete" should only be used with small amounts of data . compare delete vs others
@moskito - It would seem so, yes. IMO, though, explaining the reason and giving a simple example should suffice.
@Sertac I think that ought to be the case. Sadly the reality is that people who don't know any better will blindly use whatever code you post. A disclaimer stating, "don't use this code in production since it thrashes the heap" would be good enough for me. Even then though I'm many folk would ignore the warning!
@David - I think I was mistaken that the poster does have any interest whatsoever about the reason of the behavior. All that's really asked seems to be write the code, in which case your answer is the only alternative in fact. I don't have any problem with your answer being selected, what bothered me was the fact that I'm not always able to make out what kind of an answer is required.
I agree. Your answer explains why StringReplace behaves as it does. And as such it should be accepted. It really doesn't look as though the asked has any interest at all in understanding what's going on and that is the fundamental reason why this problem is still extant.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.