Python: How to write to csv properly after using webdriver

Question

I used xpath helper to help me scrapping a table in a login website.

Code:

g=driver.find_element_by_xpath("//table[@id='DataGrid']/tbody").text
print(g)

The result looks like this, data type is "string":

#@5@#*&(
&*(%#IO
!@%&*(O)
2018/02/02 206 MAZDA MAZDA 5 5660-ES 2006 01 1999 70000 white A
2018/02/02 210 BMW 330 9378-W6 2006 01 2996 80000 black C
2018/02/02 211 MITSUBISHI FORTIS ALK-3501 2015 04 1798 100000 white C+

I want to write this string into csv without the first three lines and use comma to separate them otherwise they will all combine together.

Code here:

if "#@5@#*&(" in g and "&*(%#IO" in g and "!@%&*(O)" in g:
    g=g.replace("#@5@#*&(", "")
    g=g.replace("&*(%#IO", "")
    g=g.replace("!@%&*(O)", "")
    g=g.replace(' ', ',')  
print(g)
file_name="C:/Test.csv"
with open(file_name,'a') as file:
    file.write(g+'\n')

What bothered me is that I don't know how to delete the first three lines. I replace them with blank space, but they are still there, everytime when I write into csv, they all take place. Second is that, when I separate them with comma, there were some errors. Like Mazda 5, it should not be separated. Is there any good way to solve this problem? or should I just correct it in csv file?

source code:

<tr align="left" style="height:40px;">
  <td>2018/02/02</td>
  <td>206</td>
  <td>MAZDA</td>
  <td>MAZDA 5</td>
  <td>5660-ES</td>
  <td>2006</td>
  <td>01</td>
  <td>1999</td>
  <td>70000</td>
  <td>white</td>
  <td align="center" valign="middle"></td>
  <td>A</td>
</tr>

kszl · Accepted Answer · 2018-02-06 08:54:48Z

1

When it comes to removing the first 3 lines, you could either:

replace new line character with nothing (use string like "#@5@#*&(\n"); or
split the original string into lines and remove the first 3, then combine them again "\n".join(g.split("\n")[3:])

The second issue is much harder, because by saving all the content of tbody into one variable, you effectively lost the information about separators. Now you have no way to know whether the space was originally there or is just a separator added automatically. I'd suggest scraping each td cell individually.

answered Feb 6, 2018 at 8:54

kszl

1,2111 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Krishal · Accepted Answer · 2018-02-06 08:53:27Z

1

To remove the first few lines from a string, just figure out the position of the first relevant piece of info.

temp = "adknsad"

temp[2:] would output something like "knsad"

It should be the same for the piece of string you have.

I don't think there is any simple way to solve the Mazda 5 thing.

answered Feb 6, 2018 at 8:53

Krishal

113 bronze badges

Collectives™ on Stack Overflow

Python: How to write to csv properly after using webdriver

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related