I used xpath helper to help me scrapping a table in a login website.
Code:
g=driver.find_element_by_xpath("//table[@id='DataGrid']/tbody").text
print(g)
The result looks like this, data type is "string":
#@5@#*&(
&*(%#IO
!@%&*(O)
2018/02/02 206 MAZDA MAZDA 5 5660-ES 2006 01 1999 70000 white A
2018/02/02 210 BMW 330 9378-W6 2006 01 2996 80000 black C
2018/02/02 211 MITSUBISHI FORTIS ALK-3501 2015 04 1798 100000 white C+
I want to write this string into csv without the first three lines and use comma to separate them otherwise they will all combine together.
Code here:
if "#@5@#*&(" in g and "&*(%#IO" in g and "!@%&*(O)" in g:
g=g.replace("#@5@#*&(", "")
g=g.replace("&*(%#IO", "")
g=g.replace("!@%&*(O)", "")
g=g.replace(' ', ',')
print(g)
file_name="C:/Test.csv"
with open(file_name,'a') as file:
file.write(g+'\n')
What bothered me is that I don't know how to delete the first three lines. I replace them with blank space, but they are still there, everytime when I write into csv, they all take place. Second is that, when I separate them with comma, there were some errors. Like Mazda 5, it should not be separated. Is there any good way to solve this problem? or should I just correct it in csv file?
source code:
<tr align="left" style="height:40px;">
<td>2018/02/02</td>
<td>206</td>
<td>MAZDA</td>
<td>MAZDA 5</td>
<td>5660-ES</td>
<td>2006</td>
<td>01</td>
<td>1999</td>
<td>70000</td>
<td>white</td>
<td align="center" valign="middle"></td>
<td>A</td>
</tr>