1

I am scraping links from a website directory, there are 13800 records, 690 pages and 20 records per page, but I am getting the first and last page links. I need all profile links with names in csv file. Any help would be great for me.

from selenium import webdriver
from selenium.common import exceptions
import pandas as pd

browser = webdriver.Chrome()
browser.get('https://jito.org/members')

name_list =[]
link_list = []

i = 0
while i < 10:
    try:
        results = browser.find_elements_by_xpath("//*[@class='name']")

        for directory in results:
            name = directory.text
            link = directory.find_element_by_tag_name('a')
            person_link = link.get_attribute("href")

            name_list.append(name)
            link_list.append(person_link)


        browser.find_element_by_css_selector("[title^='Next']").click()
        i += 1

    except exceptions.StaleElementReferenceException:
         pass

df = pd.DataFrame(list(zip(name_list, link_list)), columns=['Name', 'Link'])

JITO_data = df.to_csv('JITO_Directory.csv', index=False)
2
  • Why don't you try using requests to scrape the fields? Commented Jun 2, 2020 at 16:55
  • @SIM I don't know about it. can you help me? Commented Jun 2, 2020 at 17:55

1 Answer 1

1

To extract link and name from all web pages you can do without selenium.Use python requests module and beautiful soup and then load data into pandas and import into csv.

import requests
from bs4 import BeautifulSoup
import pandas as pd
i=0
name_list =[]
link_list = []
while(i<=13780):
    #print("https://jito.org/members?start={}".format(i))
    res=requests.get("https://jito.org/members?start={}".format(i))
    soup=BeautifulSoup(res.text,"html.parser")
    for item in soup.select('.name>a'):
        name_list.append(item.text)
        link_list.append("https://jito.org" + item['href'])
    i=i+20

df=pd.DataFrame({"Name":name_list,"Link":link_list})
df.to_csv('JITO_Directory.csv', index=False)

Please note if you do not have those library then you need to install it first.

Generated csv result as you can see 13789 records

enter image description here


Updated with print statement for troubleshoot.You can see each iteration as well dataframe.

import requests
from bs4 import BeautifulSoup
import pandas as pd
i=0
name_list =[]
link_list = []
while(i<=13780):
    print("https://jito.org/members?start={}".format(i))
    res=requests.get("https://jito.org/members?start={}".format(i))
    soup=BeautifulSoup(res.text,"html.parser")
    for item in soup.select('.name>a'):
        name_list.append(item.text)
        link_list.append("https://jito.org" + item['href'])
    i=i+20
    print(name_list)
    print(link_list)

df=pd.DataFrame({"Name":name_list,"Link":link_list})
print(df)
df.to_csv('JITO_Directory.csv', index=False)
print('Done')

Update print results.

https://jito.org/members?start=0
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar']
https://jito.org/members?start=20
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain']
https://jito.org/members?start=40
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain', 'Avinash  Jain', 'Vijay  Jain', 'Subhash  Sancheti', 'Rajesh Kumar  Golechha', 'Tejaswini Sudarshan Bafna', 'Swapnil Vilas  Shah', 'Sudeep Vijay Chhallani', 'Sanjay Bansilal Chordiya', 'Preeti Manoj Chhajed', 'Prakash Javerchand Oswal', 'Kiran Bachulal Rathod', 'Devendra Mangilal Bhansali', 'Anand Nitinbhai Mehta', 'Surya Prakash Chopra', 'Sanjay  Gemawat', 'Sangita Jain. Jain Lunker', 'Sham Lal Jain', 'Sanjay  Golecha', 'Manoj Kumar Jain', 'Yogesh Brijlalji Chopda']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain', 'https://jito.org/profile/14189-avinash-jain', 'https://jito.org/profile/14188-vijay-jain', 'https://jito.org/profile/14187-subhash-sancheti', 'https://jito.org/profile/14186-rajesh-kumar-golechha', 'https://jito.org/profile/14185-tejaswini-sudarshan-bafna', 'https://jito.org/profile/14184-swapnil-vilas-shah', 'https://jito.org/profile/14183-sudeep-vijay-chhallani', 'https://jito.org/profile/14182-sanjay-bansilal-chordiya', 'https://jito.org/profile/14181-preeti-manoj-chhajed', 'https://jito.org/profile/14180-prakash-javerchand-oswal', 'https://jito.org/profile/14179-kiran-bachulal-rathod', 'https://jito.org/profile/14178-devendra-mangilal-bhansali', 'https://jito.org/profile/14177-anand-nitinbhai-mehta', 'https://jito.org/profile/14176-surya-prakash-chopra', 'https://jito.org/profile/14175-sanjay-gemawat', 'https://jito.org/profile/14174-sangita-jain-jain-lunker', 'https://jito.org/profile/14173-sham-lal-jain', 'https://jito.org/profile/14172-sanjay-golecha', 'https://jito.org/profile/14171-manoj-kumar-jain', 'https://jito.org/profile/14170-yogesh-brijlalji-chopda']
https://jito.org/members?start=60
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain', 'Avinash  Jain', 'Vijay  Jain', 'Subhash  Sancheti', 'Rajesh Kumar  Golechha', 'Tejaswini Sudarshan Bafna', 'Swapnil Vilas  Shah', 'Sudeep Vijay Chhallani', 'Sanjay Bansilal Chordiya', 'Preeti Manoj Chhajed', 'Prakash Javerchand Oswal', 'Kiran Bachulal Rathod', 'Devendra Mangilal Bhansali', 'Anand Nitinbhai Mehta', 'Surya Prakash Chopra', 'Sanjay  Gemawat', 'Sangita Jain. Jain Lunker', 'Sham Lal Jain', 'Sanjay  Golecha', 'Manoj Kumar Jain', 'Yogesh Brijlalji Chopda', 'Bipin R Shah Rasiklal Shah', 'Kalpesh Arvind Shah', 'Hemant Vishanji Dedhia', 'Manju Parasmal Golecha', 'Urmila Dilip Chandan', 'Ugamraj Misrimal Mehta', 'Surendra Madanmal Mehta', 'Shrenik Champalal Jain', 'Sanjay C Jain', 'Ratan Tarachand Mehta', 'Ramesh Sumermal Nahar', 'Rajesh Kumar Bhagchand Mehta', 'Milapchand Bhimraj Mehta', 'Mahendra Nemichand Bafna', 'Mahendra Kumar Tarachand Mehta', 'Lalit Okhraj Bokadia', 'Lalit Champalal Jain', 'Lakhpatraj Bhagchandji Mehta', 'Kushboo Chirag Chandan', 'Jaswant Bhagchand Mehta']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain', 'https://jito.org/profile/14189-avinash-jain', 'https://jito.org/profile/14188-vijay-jain', 'https://jito.org/profile/14187-subhash-sancheti', 'https://jito.org/profile/14186-rajesh-kumar-golechha', 'https://jito.org/profile/14185-tejaswini-sudarshan-bafna', 'https://jito.org/profile/14184-swapnil-vilas-shah', 'https://jito.org/profile/14183-sudeep-vijay-chhallani', 'https://jito.org/profile/14182-sanjay-bansilal-chordiya', 'https://jito.org/profile/14181-preeti-manoj-chhajed', 'https://jito.org/profile/14180-prakash-javerchand-oswal', 'https://jito.org/profile/14179-kiran-bachulal-rathod', 'https://jito.org/profile/14178-devendra-mangilal-bhansali', 'https://jito.org/profile/14177-anand-nitinbhai-mehta', 'https://jito.org/profile/14176-surya-prakash-chopra', 'https://jito.org/profile/14175-sanjay-gemawat', 'https://jito.org/profile/14174-sangita-jain-jain-lunker', 'https://jito.org/profile/14173-sham-lal-jain', 'https://jito.org/profile/14172-sanjay-golecha', 'https://jito.org/profile/14171-manoj-kumar-jain', 'https://jito.org/profile/14170-yogesh-brijlalji-chopda', 'https://jito.org/profile/14169-bipin-r-shah-rasiklal-shah', 'https://jito.org/profile/14168-kalpesh-arvind-shah', 'https://jito.org/profile/14167-hemant-vishanji-dedhia', 'https://jito.org/profile/14166-manju-parasmal-golecha', 'https://jito.org/profile/14165-urmila-dilip-chandan', 'https://jito.org/profile/14164-ugamraj-misrimal-mehta', 'https://jito.org/profile/14163-surendra-madanmal-mehta', 'https://jito.org/profile/14162-shrenik-champalal-jain', 'https://jito.org/profile/14161-sanjay-c-jain', 'https://jito.org/profile/14160-ratan-tarachand-mehta', 'https://jito.org/profile/14159-ramesh-sumermal-nahar', 'https://jito.org/profile/14158-rajesh-kumar-bhagchand-mehta', 'https://jito.org/profile/14157-milapchand-bhimraj-mehta', 'https://jito.org/profile/14156-mahendra-nemichand-bafna', 'https://jito.org/profile/14155-mahendra-kumar-tarachand-mehta', 'https://jito.org/profile/14154-lalit-okhraj-bokadia', 'https://jito.org/profile/14153-lalit-champalal-jain', 'https://jito.org/profile/14152-lakhpatraj-bhagchandji-mehta', 'https://jito.org/profile/14151-kushboo-chirag-chandan', 'https://jito.org/profile/14150-jaswant-bhagchand-mehta']
Sign up to request clarification or add additional context in comments.

14 Comments

While I am running the same as you have done, I am just getting a black csv file with column heading Name and Link.
With the above code you are getting all the links but I am getting the blank CSV file.
@SumitJha : It shouldn't happen.If you used my entire code and runs since it is 690 pages it will take at least 50-60 minutes to execute.Just put a print statement after import to csv and check that print statement generated after execution and please delete your existing csv file first.
again I ran the code deleted previous csv file. print df after execution. Here is result Empty DataFrame Columns: [Name, Link] Index: []
@SumitJha : Updated with print for you to troubleshoot easily.Run the code and check where you are getting problem.post me if you get any error or your last iteration link.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.