Using Regular Expressions With Python to Get Value Buried in HTML5

Question

I'm trying to use BeautifulSoup and RE to get a specific value from Yahoo Finance. I can't figure out exactly how to get it. I'll paste some code I have along with the HTML and unique selector I got.

I just want this number in here, "7.58," but the problem is that the class of this column is the same as many other ones in the same element.

<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><td class="yfnc_tabledata1">7.58</td>"

Here is the selector Google gave me...

yfncsumtab > tbody > tr:nth-child(2) > td.yfnc_modtitlew1 > table:nth-child(10) > tbody > tr > td > table > tbody > tr:nth-child(8) > td.yfnc_tabledata1

Here is some template code I'm using to test different things, but I'm very new to regular expressions and can't find a way to extract that number after "Diluted EPS (ttm):###

from bs4 import BeautifulSoup
import requests
import re


sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

body = soup.findAll('td')


print (body)

Thanks!

Why are you using BS and regex? In fact I don't see any attempt to use either to do what you want in your code. — jonrsharpe
– jonrsharpe, Commented Apr 22, 2016 at 18:37
I don't know the BS command to get it to find the digits after the text phrase. — Deep Value
– Deep Value, Commented Apr 22, 2016 at 20:05

Fabricator · Accepted Answer · 2016-04-22 20:38:54Z

2

You could find by text Diluted EPS (ttm): first:

soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})

edited Apr 22, 2016 at 20:38

answered Apr 22, 2016 at 18:39

Fabricator

12.8k2 gold badges29 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Deep Value Over a year ago

Thanks, I'll try that out and see if I can find the number from there.

Deep Value Over a year ago

I got a syntax error when I put this line into my editor. Do you know why? eps = soup.find('td', text='Diluted EPS (ttm):).parent.find('td', attrs={'class': 'yfnc_tabledata1'})

Fabricator Over a year ago

sorry. it was missing a closing single quote after (ttm):

Deep Value Over a year ago

Thanks, just the output I was looking for! Really appreciate your assistance. I've spent at least 6 hours on this and suffered a migraine over it, believe it or not. Maybe I should quit programming?

Quinn · Accepted Answer · 2016-04-22 21:09:51Z

1

If using regex, please try:

>>> import re
>>> text = '<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><
td class="yfnc_tabledata1">7.58</td>"'
>>> re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', text)
['7.58']

UPDATE Here is the sample code using requests and re:

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text)

Output:

[u'7.58']

edited Apr 22, 2016 at 21:09

answered Apr 22, 2016 at 18:46

Quinn

4,5142 gold badges24 silver badges19 bronze badges

6 Comments

Deep Value Over a year ago

Thanks! I'll give it a try and let you know how it goes.

Deep Value Over a year ago

How do I do this without putting the actual number in the "text =" line? I need this to dynamically update and it will be incorporated into something else that runs for different ticker symbols. I'm also getting an "EOL while scanning string literal" error trying to use this code.

Deep Value Over a year ago

Thanks, I'll play around with that version. I don't know where the 3.68 comes from. The correct result is 7.58. I was able to get it by fixing the solution above. My code now looks like this... sess = requests.Session() res = sess.get('finance.yahoo.com/q/ks?s=MMM+Key+Statistics') soup = BeautifulSoup(res.text, 'html.parser') eps = soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'}) for i in eps: print (i)

Quinn Over a year ago

Sorry about this. I updated the pattern. Please try: print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text).

Quinn Over a year ago

soup = BeautifulSoup(res.text, 'html.parser') is not required if using regex. Please see the updated code.

|

Deep Value · Accepted Answer · 2016-04-22 21:47:56Z

Thanks for answering my question. I was able to use two ways to get the desired value. The first way is this.

from bs4 import BeautifulSoup
import requests

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

eps = soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})

for i in eps:
    print (i)

Here is the second way...

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print (re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text.strip()))

I don't quite understand it all yet, but this is a great start with two different ways to understand it and move forward incorporating this aspect of the project. Really appreciate your assistance!

Collectives™ on Stack Overflow

Using Regular Expressions With Python to Get Value Buried in HTML5

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related