I want to replace text from a certain range in my HTML file (like from position 1000 to 200000) with text from another HTML file. Can someone recommend me the best way to do this?
-
Could you be a little more a specific about the notion of position in a HTML file? Maybe provide an example of how the files look before and after.Darin Dimitrov– Darin Dimitrov2010-11-07 08:24:58 +00:00Commented Nov 7, 2010 at 8:24
-
Well character position, like IndexOf. Replace from this line to this line or this string to this string. Hope it's clear now.david– david2010-11-07 08:26:33 +00:00Commented Nov 7, 2010 at 8:26
-
Sounds risky.. what if someone change the HTML slightly? Your code might crash with unexpected problems. What is the big picture here?user447356– user4473562010-11-07 11:45:51 +00:00Commented Nov 7, 2010 at 11:45
3 Answers
Pieter's way will work, but it does involve loading the whole file into memory. That may well be okay, but if you've got particularly large files you may want to consider an alternative:
- Open a
TextReaderon the original file - Open a
TextWriterfor the target file - Copy blocks of text by calling
Read/Writerepeatedly, with a buffer of say 8K characters until you've read the initial amount (1000 characters in your example) - Write the replacement text out to the target writer by again opening a reader and copying blocks
- Skip the text you want to ignore in the original file, by repeatedly reading into a buffer and just ignoring it (incrementing a counter so you know how much you've skipped, of course)
- Copy the rest of the text from the original file in the same way.
Basically it's just lots of copying operations, including one "copy" which doesn't go anywhere (for skipping the text in the original file).
Comments
Try this:
string input = File.ReadAllText("<< input HTML file >>");
string replacement = File.ReadAllText("<< replacement HTML file >>");
int startIndex = 1000;
int endIndex = 200000;
var sb = new StringBuilder(
input.Length - (endIndex - startIndex) + replacement.Length
);
sb.Append(input.Substring(0, startIndex));
sb.Append(replacement);
sb.Append(input.Substring(endIndex));
string output = sb.ToString();
1 Comment
The replacement code Pieter posted does the job, and using the StringBuilder with the known resulting length is a clever way to save performance.
Should do what you asked, but sometimes when working with structured data like html, it is preferable to load it as XML (I have used the HtmlAgilityPack for that). Then you could use XPath to find the node you want to replace, and work with it. It might be slower, but as I said, you can work with the structure then.
2 Comments
StringBuilder instead. With this, you can pre-reserve memory with the constructor. With your example, you are first constructing a string from the first Substring + myReplacement, and then a second string from this string and the second Substring. StringBuilder` is a lot more efficient.