How do I use Python csv to write multiple beautifulsoup table rows for only two specific columns?

Question

I am wanting to use beautifulsoup to scrape HTML to pull out only two columns from every row in one table. However, each "tr" row has 10 "td" cells, and I only want the [1] and [8] "td" cell from each row. What is the most pythonic way to do this?

From my input below I've got one table, one body, three rows, and 10 cells per row.

Input

<table id ="tblMain">
 <tbody>
  <tr>
   <td "text"</td>
   <td "data1"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "data2"</td>
   <td "text"</td>
  <tr>
   <td "text"</td>
   <td "data1"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "data2"</td>
   <td "text"</td>
  <tr>
   <td "text"</td>
   <td "data1"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "text"</td>
   <td "data2"</td>
   <td "text"</td>

Things I Have Tried

I understand how to use the index of the cells in order to loop through and get "td" at [1] and [8]. However, I'm getting all confused when trying to get that data on one line written back to the csv.

table = soup.find('table', {'id':'tblMain'} )
table_body = table.find('tbody')
rows = table_body.findAll('tr')
data1_columns = []
data2_columns = []
for row in rows[1:]:
    data1 = row.findAll('td')[1]
    data1_columns.append(data1.text)
    data2 = row.findAll('td')[8]
    data2_columns.append(data2.text)

This is my current code which finds the table, rows, and all "td" cells and prints them correctly to a .csv. However, instead of writing all ten "td" cells per row back to the csv line, I just want to grab "td"[1] and "td"[8].

html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', {'id':'tblMain'} )
table_body = table.find('tbody')
rows = table_body.findAll('tr')
filename = '%s.csv' % reportname
with open(filename, "wt+", newline="") as f:
    writer = csv.writer(f)
    for row in rows:
        csv_row = []
        for cell in row.findAll("td"):
            csv_row.append(cell.get_text())
        writer.writerow(csv_row)

Expected Results

I want to be able to write "td"[1] and "td"[8] back to my csv_row in order to write each list back to a the csv writer.writerow.

Writing row back to csv_row which then writes to my csv file:

['data1', 'data2']
['data1', 'data2']
['data1', 'data2']

row = row.findAll("td") and writer.writerow( [row[1], row[8]] ) — furas
– furas, Commented Apr 11, 2019 at 2:27
I appreciate your help. Although, I'm not following your suggestion very well. What exactly are you suggesting I replace with row = row.findAll("td")? — Feernot
– Feernot, Commented Apr 11, 2019 at 4:30

furas · Accepted Answer · 2019-04-11 10:14:15Z

You almost did it

for row in rows:
    row = row.findAll("td")
    csv_row = [row[1].get_text(), row[8].get_text()]
    writer.writerow(csv_row)

Full code

html ='''<table id ="tblMain">
 <tbody>
  <tr>
   <td>text</td>
   <td>data1</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>data2</td>
   <td>text</td>
  <tr>
   <td>text</td>
   <td>data1</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>data2</td>
   <td>text</td>
  <tr>
   <td>text</td>
   <td>data1</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>text</td>
   <td>data2</td>
   <td>text</td>
'''

from bs4 import BeautifulSoup
import csv

soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table', {'id':'tblMain'} )
table_body = table.find('tbody')

rows = table_body.findAll('tr')

reportname = 'output'
filename = '%s.csv' % reportname

with open(filename, "wt+", newline="") as f:
    writer = csv.writer(f)
    for row in rows:
        row = row.findAll("td")
        csv_row = [row[1].get_text(), row[8].get_text()]
        writer.writerow(csv_row)

QHarr · Accepted Answer · 2019-04-10 22:40:59Z

0

You should be able to use nth-of-type pseudo class css selector

from bs4 import BeautifulSoup as bs
import pandas as pd
html = 'actualHTML'
soup = bs(html, 'lxml')
results = []
for row in soup.select('#tblMain tr'):
    out_row = [item.text.strip() for item in row.select('td:nth-of-type(2), td:nth-of-type(9)')]
    results.append(out_row)
df = pd.DataFrame(results)
print(df)
df.to_csv(r'C:\Users\User\Desktop\data.csv', sep=',', encoding='utf-8-sig',index = False )

answered Apr 10, 2019 at 22:40

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Comments

chitown88 · Accepted Answer · 2019-04-11 08:45:29Z

Whenever I need to pull a table and it has the <table> tag, I let Pandas do the work for me, then just maniuplate the dataframe it returns if needed. That's what I would do here:

html = '''<table id ="tblMain">
 <tbody>
  <tr>
   <td> text</td>
   <td> data1</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> data2</td>
   <td> text</td>
  <tr>
   <td> text</td>
   <td> data1</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> data2</td>
   <td> text</td>
  <tr>
   <td> text</td>
   <td> data1</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> text</td>
   <td> data2</td>
   <td> text</td>'''


import pandas as pd

# .read_html() returns a list of dataframes
tables = pd.read_html(html)[0]

# we want the dataframe from that list in position [0]
df = tables[0]

# Use .iloc to say I want all the rows, and columns 1, 8
df = df.iloc[:,[1,8]]

# Write the dataframe to file
df.to_csv('path.filename.csv', index=False)

Collectives™ on Stack Overflow

How do I use Python csv to write multiple beautifulsoup table rows for only two specific columns?

Input

Things I Have Tried

Expected Results

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Input

Things I Have Tried

Expected Results

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related