Loop through URL using Python

Question

I have looked at a few questions but none of the answers seem to fit. I am building a webscraper tool as a personal project. I have figured out the loops to get rider data for the Vuelta 2022 however I need to loop through all the urls for each stage. For some reason, the url loop is taking the last number in the range. My gut feeling is the formatting so I am trying to play around with that but no luck

import requests
from bs4 import BeautifulSoup
import pandas as pd

for j in range (1,10):
    url = (f"https://www.lavuelta.es/en/rankings/stage-{j}")
    page = requests.get(url)
    urlt = page.content
    soup = BeautifulSoup(urlt)
    rider_rank_list = []
for i in range (1,11):
#create list of riders
    results = soup.select_one(f"body > main > div > section.ranking.classements > div > div > div.js-tabs-wrapper.js-tabs-bigwrapper > div > div > div > div > div.js-spinner-wrapper > div > div.sticky-scroll > table > tbody > tr:nth-child({i}) > td.runner.is-sticky > a ")

        
#create rider rank list
    rrank = soup.select_one(f"body > main > div > section.ranking.classements > div > div > div.js-tabs-wrapper.js-tabs-bigwrapper > div > div > div > div > div.js-spinner-wrapper > div > div.sticky-scroll > table > tbody > tr:nth-child({i}) > td:nth-child(1)")


#create stage name
    stage = str.replace(str.title(url.rsplit('/', 1)[-1]),'-',' ')

    rider_rank_list.append((str(stage),str.strip(results.text), str.strip(rrank.text)))


    
print(rider_rank_list)
df = pd.DataFrame(rider_rank_list, columns=['stage','rider','rank'], index=None)
print(df)

df.to_csv('data.csv', index=False)

@BeRT2me. I don't think so, I think I need to learn how to indent properly — mgd6
– mgd6, Commented Sep 8, 2022 at 19:39

Irem · Accepted Answer · 2022-09-08 17:41:39Z

2

fixed indentation, with small changes

import requests
from bs4 import BeautifulSoup
import pandas as pd

rider_rank_list = []

for j in range (1,10):
    url = (f"https://www.lavuelta.es/en/rankings/stage-{j}")
    page = requests.get(url)
    urlt = page.content
    soup = BeautifulSoup(urlt)
    
    for i in range (1,11):
        #create list of riders
        results = soup.select_one(f"body > main > div > section.ranking.classements > div > div > div.js-tabs-wrapper.js-tabs-bigwrapper > div > div > div > div > div.js-spinner-wrapper > div > div.sticky-scroll > table > tbody > tr:nth-child({i}) > td.runner.is-sticky > a ")

        if results != None: 
        
            #create rider rank list
            rrank = soup.select_one(f"body > main > div > section.ranking.classements > div > div > div.js-tabs-wrapper.js-tabs-bigwrapper > div > div > div > div > div.js-spinner-wrapper > div > div.sticky-scroll > table > tbody > tr:nth-child({i}) > td:nth-child(1)")

            #create stage name
            stage = str.replace(str.title(url.rsplit('/', 1)[-1]),'-',' ')
        
            rider_rank_list.append((str(stage),str.strip(results.text), str.strip(rrank.text)))


    
print(rider_rank_list)
df = pd.DataFrame(rider_rank_list, columns=['stage','rider','rank'], index=None)
print(df)

df.to_csv('data.csv', index=False)

answered Sep 8, 2022 at 17:41

Irem

461 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mgd6 Over a year ago

this is perfect. What was the indentation issue? I guess i need to learn code/loop formatting. Thank you for the answer

Irem Over a year ago

you are welcome. I moved second for loop inside the first one. Check out nested loop: pynative.com/python-nested-loops

imxitiz · Accepted Answer · 2022-09-08 18:45:30Z

0

Inspired from other answers and added complete easier and probably more readable table format solution to your question:

import pandas as pd

al=pd.DataFrame()

for i in range(2,19): # Stage only started from 2 to 18
    url = f"https://www.lavuelta.es/en/rankings/stage-{i}"
    df=pd.read_html(url)[0]

    # Taking top 10 rider ie. top 10 ranked riders only
    df=df[["Rider"]][:10]

    # Renaming using "Rider" with stage number
    df.columns=[f"Stage - {i} - Rider"]

    # Adding all Rider column horizontally
    al=pd.concat([al,df],axis=1)


al.to_csv('data.csv', index=False)

answered Sep 8, 2022 at 18:45

imxitiz

4,0253 gold badges13 silver badges36 bronze badges

Collectives™ on Stack Overflow

Loop through URL using Python

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related