Get specific string from HTML for web scraping

Question

I'm trying get the names of the stocks that are hyperlinked in a website. For reproducibility:

import requests
from bs4 import BeautifulSoup

URL = 'https://seekingalpha.com/news/3592559-nvax-nbl-among-premarket-gainers'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='bullets_ul')

stock_elems = results.find_all('span', class_='ticker-hover-wrapper')

I'm trying to get the names underlined in a list.

I've tried some variations of the following code without success:

for stock_elem in stock_elems:
    stock_name = stock_elem.find('href', class_='*')
    print(symbol_name.text.strip())

Any help would be greatly appreciated.

then it says 'symbol_name' is not defined. Can you elaborate? @Selcuk — Artur
– Artur, Commented Jul 21, 2020 at 1:31
That's because you don't have a variable named symbol_name. You must have meant stock_name. — Selcuk
– Selcuk, Commented Jul 21, 2020 at 1:48

MrNobody33 · Accepted Answer · 2020-07-21 01:36:03Z

1

Try this:

import requests
from bs4 import BeautifulSoup

URL = 'https://seekingalpha.com/news/3592559-nvax-nbl-among-premarket-gainers'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='bullets_ul')

stock_elems = results.find_all('span', class_='ticker-hover-wrapper')
ls=[i.find('a').text for i in stock_elems]

Output:

ls
['DPW',
 'IMRN',
 'BTAI',
 'SONN',
 'VOLT',
 'IBIO',
 'AIKI',
 'DGLY',
 'IDRA',
 'HTBX',
 'JOB',
 'NAK',
 'VBIV',
 'NBL',
 'OGEN',
 'ANVS',
 'XBIO',
 'BNTX',
 'CKPT',
 'FIXX',
 'FLDM',
 'PDSB',
 'CFRX',
 'MVIS',
 'NVAX']

answered Jul 21, 2020 at 1:36

MrNobody33

6,5039 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user13244731 · Accepted Answer · 2020-07-21 01:39:00Z

Try using get_text() method of the all the navegable string of the find_all() list:

strings = [x.get_text() for x in stock_elems]

This generator expression will return (print()) a list with al the text:

['(NYSEMKT:DPW)', '(NASDAQ:IMRN)', '(NASDAQ:BTAI)', '(NASDAQ:SONN)', '(NYSEMKT:VOLT)', '(NYSEMKT:IBIO)', '(NASDAQ:AIKI)', '(NASDAQ:DGLY)', '(NASDAQ:IDRA)', '(NASDAQ:HTBX)', '(NYSEMKT:JOB)', '(NYSEMKT:NAK)', '(NASDAQ:VBIV)', '(NASDAQ:NBL)', '(NYSEMKT:OGEN)', '(NYSEMKT:ANVS)', '(NASDAQ:XBIO)', '(NASDAQ:BNTX)', '(NASDAQ:CKPT)', '(NASDAQ:FIXX)', '(NASDAQ:FLDM)', '(NASDAQ:PDSB)', '(NASDAQ:CFRX)', '(NASDAQ:MVIS)', '(NASDAQ:NVAX)']

You can use another generator expression to get just the text yoy want:

spec_strings = [y.split(":")[1][:-1] for y in strings]

Here you are getting he second element of the split of ":" and slicing it to get the text without the final ")". So, with this

stock_elems = results.find_all('span', class_='ticker-hover-wrapper')
strings = [x.get_text() for x in stock_elems]
spec_strings = [y.split(":")[1][:-1] for y in strings]
print(spec_strings)

you can get this:

['DPW', 'IMRN', 'BTAI', 'SONN', 'VOLT', 'IBIO', 'AIKI', 'DGLY', 'IDRA', 'HTBX', 'JOB', 'NAK', 'VBIV', 'NBL', 'OGEN', 'ANVS', 'XBIO', 'BNTX', 'CKPT', 'FIXX', 'FLDM', 'PDSB', 'CFRX', 'MVIS', 'NVAX']

I hope have helped you

Also useful. I can use the stock exchange when gathering data (NASDAQ or NYSE). Thanks

Collectives™ on Stack Overflow

Get specific string from HTML for web scraping

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related