1

After connecting to a socket and capturing the response using .read() how do I parse the input stream and read lines?

I see the data is returned without any CRLF

<html><head><title>Apache Tomcat/6.0.16 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /index.html</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/index.html</u></p><p><b>description</b> <u>The requested resource (/index.html) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.22</h3></body></html>
4
  • and what is it you want to parse? Commented Feb 2, 2010 at 4:19
  • Maybe the read() is not working correctly yet as the output seems to be an error message Commented Feb 2, 2010 at 4:21
  • If he is receiving an error message, then it is working. Commented Feb 2, 2010 at 4:23
  • Ok, so it seems like httplib and urllib return different data. urllib worked as expected so thanks for the tip. Whoever mentioned that, urllib.read() returns the data with CRLF where as httplib.read() seems to return the data deleting the whitespace. Commented Feb 3, 2010 at 18:59

2 Answers 2

3

You have to parse the HTML. Python has several ways of parsing HTML - one of them the built-in HTMLParser module. Another, and probably better way, is the 3rd party BeautifulSoup module.

Many other issues dealing with HTML processing are explained in this nice article. You can also read the relevant chapter of the (free online) Dive into Python book.

Sign up to request clarification or add additional context in comments.

Comments

0

Use an HTML parser. Beautiful Soup seems to be a popular one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.