0

I'm trying to use BeautifulSoup and RE to get a specific value from Yahoo Finance. I can't figure out exactly how to get it. I'll paste some code I have along with the HTML and unique selector I got.

I just want this number in here, "7.58," but the problem is that the class of this column is the same as many other ones in the same element.

<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><td class="yfnc_tabledata1">7.58</td>"

Here is the selector Google gave me...

yfncsumtab > tbody > tr:nth-child(2) > td.yfnc_modtitlew1 > table:nth-child(10) > tbody > tr > td > table > tbody > tr:nth-child(8) > td.yfnc_tabledata1

Here is some template code I'm using to test different things, but I'm very new to regular expressions and can't find a way to extract that number after "Diluted EPS (ttm):###

from bs4 import BeautifulSoup
import requests
import re


sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

body = soup.findAll('td')


print (body)

Thanks!

2
  • Why are you using BS and regex? In fact I don't see any attempt to use either to do what you want in your code. Commented Apr 22, 2016 at 18:37
  • I don't know the BS command to get it to find the digits after the text phrase. Commented Apr 22, 2016 at 20:05

3 Answers 3

2

You could find by text Diluted EPS (ttm): first:

soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, I'll try that out and see if I can find the number from there.
I got a syntax error when I put this line into my editor. Do you know why? eps = soup.find('td', text='Diluted EPS (ttm):).parent.find('td', attrs={'class': 'yfnc_tabledata1'})
sorry. it was missing a closing single quote after (ttm):
Thanks, just the output I was looking for! Really appreciate your assistance. I've spent at least 6 hours on this and suffered a migraine over it, believe it or not. Maybe I should quit programming?
1

If using regex, please try:

>>> import re
>>> text = '<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><
td class="yfnc_tabledata1">7.58</td>"'
>>> re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', text)
['7.58']

UPDATE Here is the sample code using requests and re:

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text)

Output:

[u'7.58']

6 Comments

Thanks! I'll give it a try and let you know how it goes.
How do I do this without putting the actual number in the "text =" line? I need this to dynamically update and it will be incorporated into something else that runs for different ticker symbols. I'm also getting an "EOL while scanning string literal" error trying to use this code.
Thanks, I'll play around with that version. I don't know where the 3.68 comes from. The correct result is 7.58. I was able to get it by fixing the solution above. My code now looks like this... sess = requests.Session() res = sess.get('finance.yahoo.com/q/ks?s=MMM+Key+Statistics') soup = BeautifulSoup(res.text, 'html.parser') eps = soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'}) for i in eps: print (i)
Sorry about this. I updated the pattern. Please try: print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text).
soup = BeautifulSoup(res.text, 'html.parser') is not required if using regex. Please see the updated code.
|
0

Thanks for answering my question. I was able to use two ways to get the desired value. The first way is this.

from bs4 import BeautifulSoup
import requests

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

eps = soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})

for i in eps:
    print (i)

Here is the second way...

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print (re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text.strip()))

I don't quite understand it all yet, but this is a great start with two different ways to understand it and move forward incorporating this aspect of the project. Really appreciate your assistance!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.