0

I'm doing some progress with web scraping however I still need some help to perform some operations:

import requests
import pandas as pd
from bs4 import BeautifulSoup




url = 'http://fcf.cat/equip/1920/1i/sant-ildefons-ue-b'

# soup = BeautifulSoup(requests.get(converturl).content, 'html.parser')

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

out = []

for tr in soup.select('.col-md-4 tbody tr'):

On the class col-md-4 I know there are 3 tables I want to generate a csv which as an output has three values: first name, last name, and for the last value I want the header name of the table.

first name, last name, header table

Any help would be appreciated.

2
  • See if this help, stackoverflow.com/questions/39710903/… Commented Jun 1, 2020 at 11:46
  • Thanks for the link but this is using pandas and I would like to use beautifulsoup. Commented Jun 1, 2020 at 12:16

3 Answers 3

1

This is what I have done on my own:

import requests
import pandas as pd
from bs4 import BeautifulSoup





url = 'http://fcf.cat/equip/1920/1i/sant-ildefons-ue-b'


soup = BeautifulSoup(requests.get(url).content, 'html.parser')

filename = url.rsplit('/', 1)[1] + '.csv'


tables = soup.select('.col-md-4 table')
rows = []

for tr in tables:
    t = tr.get_text(strip=True, separator='|').split('|')
    rows.append(t)
    df = pd.DataFrame(rows)
    print(df)
    df.to_csv(filename)

Thanks,

Sign up to request clarification or add additional context in comments.

Comments

1

This might work:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'http://fcf.cat/equip/1920/1i/sant-ildefons-ue-b'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
tables = soup.select('.col-md-4 table')
rows = []

for table in tables:
    cleaned = list(table.stripped_strings)
    header, names = cleaned[0], cleaned[1:]
    data = [name.split(', ') + [header] for name in names]
    rows.extend(data)

result = pd.DataFrame.from_records(rows, columns=['surname', 'name', 'table'])

4 Comments

thanks for the help. I have pasted the code on visual studio but I have an error SyntaxError: 'return' outside function
I've edited the answer, you'll have the desired result in the result variable.
Hi Milan I appreciate your support, I have tried the code again and I still get an issue. Exception has occurred: TypeError 'generator' object is not subscriptable File "plantillasfcf.py", line 30, in <module> header, names = cleaned[0], cleaned[1:]
Sorry. I've edited the answer - the output of stripped_strings needs to be wrapped in a list. Try again?
1

You need to first iterate through each table you want to scrape, then for each table, get its header and rows of data. For each row of data, you want to parse out the First Name and Last Name (along with the header of the table).

Here's a verbose working example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'http://fcf.cat/equip/1920/1i/sant-ildefons-ue-b'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

out = []

# Iterate through each of the three tables
for table in soup.select(".col-md-4 table"):

    # Grab the header and rows from the table
    header = table.select("thead th")[0].text.strip()
    rows = [s.text.strip() for s in table.select("tbody tr")]

    t = []  # This list will contain the rows of data for this table

    # Iterate through rows in this table
    for row in rows:

        # Split by comma (last_name, first_name)
        split = row.split(",")

        last_name = split[0].strip()
        first_name = split[1].strip()

        # Create the row of data
        t.append([first_name, last_name, header])

    # Convert list of rows to a DataFrame
    df = pd.DataFrame(t, columns=["first_name", "last_name", "table_name"])

    # Append to list of DataFrames
    out.append(df)

# Write to CSVs...
out[0].to_csv("first_table.csv", index=None)  # etc...

Whenever you're web scraping, I highly recommend using strip() on all of the text you parse to make sure you don't have superfluous spaces in your data.

I hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.