2

I have written a web scraping program to go to a list of pages and write all the html to a file. The problem is that when I pull a block of text some of the characters get written as '�'. How do I pull those characters into my text file? Here is my code:

string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage.asp?did={0}&blockName={1}", id.ToString(), name.Trim());

// our third request is for the actual webpage after the login.
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(baseUri);
request.Method = "GET";
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)";
//get the response object, so that we may get the session cookie.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());

// and read the response
string page = reader.ReadToEnd();

StreamWriter SW;
string filename = string.Format("{0}.txt", id.ToString());
SW = File.AppendText("C:\\Share\\" + filename);

SW.Write(page);

reader.Close();
response.Close();
1

3 Answers 3

2

You're saving a page named loadimage to a text file. Are you sure that's really all text?

Either way, you can save yourself a lot of code by using System.Net.WebClient.DownloadFile().

Sign up to request clarification or add additional context in comments.

Comments

1

You need to specify your encoding in this line:

StreamReader reader = new StreamReader(response.GetResponseStream());

and

File.AppendText("C:\\Share\\" + filename); uses UTF-8

Comments

0

Specify Unicode encoding, like so:

New StreamReader(response.GetResponseStream(), Text.Encoding.UTF8)

..same for the StreamWriter

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.