How to convert results (urls) from string to DataFrame for Pandas to_excel?

Question

My Code:

from bs4 import BeautifulSoup as soup
from numpy.lib.function_base import extract
import requests
import pandas as pd

Scraper2Excel = "C:\\Users\\Ashley\\FromPython3.xlsx"

writer = pd.ExcelWriter(Scraper2Excel, engine='xlsxwriter')

READ = "C:\\Users\\Ashley\\URLs List.xlsx"
Tickers1 = pd.read_excel(READ, sheet_name='Tickers', header=None)
Tickers = Tickers1.values.ravel()
print(Tickers)

UniformResourceLocators = pd.read_excel(READ, sheet_name= 'URLs', header=None, skiprows=1)
UniformResourceLocatorsTitles = pd.read_excel(READ, sheet_name='URLs', header=None, nrows=1).values[0]
UniformResourceLocators.columns = UniformResourceLocatorsTitles

URLs = UniformResourceLocators['Company News URL']
tick = UniformResourceLocators['Tickers']

startrow =0

for i in Tickers:
    s = Tickers1.loc[(Tickers1[0]==i)]
    print(s)
    s.to_excel(writer, sheet_name='Sheet1', startrow= startrow, startcol= 0, header=False, index=False)
    startrow += 1

    url = URLs.loc[(tick==i)]
    print(url)
    
    for i in url:
        html_text = requests.get(i).text
        chickennoodle = soup(html_text, 'html.parser')
    
        for link in chickennoodle.find_all('a'):
            my_links = (link.get('href'))
            


            print(my_links)

I get stuck here. my_links prints a bunch of URLs in a string format, and I'm wanting to output them to an excel file. I haven't been able to find a way to convert it to a DataFrame so pandas will let me use to_excel. I'm very novice so thanks for any help.

             
            #df = my_links??
            df.to_excel(writer, sheet_name='Sheet2', startrow= startrow, startcol=0, header=False, index=False)
            startrow += 1


writer.save()

Daniel Weigel · Accepted Answer · 2022-04-27 06:31:20Z

1

What I would do is initiate an empty list before your for loop and then append 'my_links' to that list within your loop.

Then at the end of your code , you can convert that list to a column of your df before exporting to excel. Something like

mylinksList=[]
df=pd.DataFrame()

for i in url:
        html_text = requests.get(i).text
        chickennoodle = soup(html_text, 'html.parser')
    
        for link in chickennoodle.find_all('a'):
            my_links = (link.get('href'))
            mylinksList.append(my_links)



df['links']=pd.Series(mylinksList)

answered Apr 27, 2022 at 6:31

Daniel Weigel

1,1372 gold badges11 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Konstantin Z Over a year ago

A nice and fancy addition would be to add links in a string format as following str(f'=HYPERLINK("websitename.com{link_adress}", "{link_text}")')

Ashley Adams Over a year ago

thanks this helped tons! I ended up doing mylinksList.append(my_links) a = pd.DataFrame(mylinksList) a.to_excel...

Collectives™ on Stack Overflow

How to convert results (urls) from string to DataFrame for Pandas to_excel?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related