Before last week, my experience with Python had been very limited to large database files on our network, and suddenly I am thrust into the world of trying to extract information from html tables.
After a lot of reading, I chose to use lxml and xpath with Python 2.7 to retrieve the data in question. I have retrieved one field using the following code:
xpath = "//table[@id='resultsTbl1']/tr[position()>1]/td[@id='row_0_partNumber']/child::text()"
which produced the following list:
['\r\n\t\tBAR18FILM/BKN', '\r\n\t\t\r\n\t\t\t', '\r\n\t\t\t', '\r\n\t\t\t', '\r\n\t\t\t', '\r\n\t\t\t', '\r\n\t\t\t\r\n\t\t']
I recognized the CR/LF and tab escape characters, I was wondering how to avoid them?