3

is there a way to convert the data I have web scraped into a pandas DataFrame?

Scraped data is Stock fundamentals data Ex. abt: 2.71 6.00 abt = stock ticker, 2.71 = Price to Book ratio and 6.00 = PEG ratio

I tried declaring a variable with an empty dataframe and used .append() function but no luck

Im guessing the data should be converted somehow before it can be passed to a dataframe, but Im now aware on how to do it.

Code redone with the suggestion from the comments, now the dataframe is coming out empty???

import time
import urllib.request
import urllib.parse
import pandas as pd

sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']
#stock = 'a'

data = []

color_list = ['<span style="color:#aa0000;">', '<span style="color:#008800;">']
color_close = '</span>'


def finvizPBStats(stock):

    try:

        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+stock).read()
        sourceCodeString = sourceCode.decode()
        pbr = sourceCodeString.split('P/B</td><td width="8%" class="snapshot-td2" align="left"><b>')[1].split('</b></td>')[0]

        for color in color_list:
            if color in pbr:
                pbr = pbr.split(color)[1].split(color_close)[0]
                pbr = float(pbr)

    except Exception as e:
        if Exception:
            pass 

    return        


def finvizPEGStats(stock):

    try: 

        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+stock).read()
        sourceCodeString = sourceCode.decode()  
        PEG = sourceCodeString.split('PEG</td><td width="8%" class="snapshot-td2" align="left"><b>')[1].split('</b></td>')[0]
        for color in color_list:
            if color in PEG:
                PEG = PEG.split(color)[1].split(color_close)[0]
                PEG = float(PEG)

    except Exception as e:
        if Exception:
            pass

    return

for stock in sp500short:
    pbr = finvizPBStats(stock)
    PEG = finvizPEGStats(stock)
    data.append([pbr, PEG])

df = pd.DataFrame(index=sp500short, columns=['pbr', 'PEG'])

print(df)     
2
  • What kind of data structure holds your original data? If it's compatible, it could be as simple as df(input_data). Commented Dec 19, 2016 at 18:41
  • You are not returning anything. Do this: return pbr, PEG. Also, with how your functions are structured, this may raise errors unless you initialize pbr and PEG. For example you could try adding pbr, PEG = 0, 0 in the functions before the try / except statements. Commented Dec 19, 2016 at 20:49

2 Answers 2

1

First of all I would get your function to return the output data: pbr, PEG. Then you could do something like this:

data = []
for stock in sp500short:
    pbr, PEG = finvizKeyStats(stock)
    data.append([pbr, PEG])
    time.sleep(1)

pd.DataFrame(data, index=sp500short, columns=['pbr', 'PEG'])
Sign up to request clarification or add additional context in comments.

1 Comment

I tried it this way as it seemed the simplest solution, but I got an empty data frame, data list for some reason is not populating, I reposed the code.
0

I used BeautifulSoup and got the entire table of data

import urllib
from bs4 import BeautifulSoup
from io import StringIO
import pandas as pd

sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']

def get_fin(sym):
    try:
        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+sym).read()
        soup = BeautifulSoup(sourceCode, 'lxml')
        table = soup.find("table", attrs={"class":"snapshot-table2"})
        tdf = pd.read_html(StringIO(table.__repr__()))
        vals = tdf[0].values.reshape(-1, 2)
        return pd.Series(vals[:, 1], vals[:, 0]).rename(sym)
    except:
        pass

df = pd.concat([get_fin(sym) for sym in sp500short], axis=1)

df.head()

enter image description here


focus on specific ratios

with a list of ratios, you can access the relevant data easily.

ratios = ['P/E', 'PEG']
df.loc[ratios]

enter image description here


note:
I'm questioning my use of __repr__ to get the html string.

4 Comments

I don't think rewriting the webscraping part of a question saying "how do I post-process my webscraping code" is useful. Doesn't really help OP (well, not trivially), and definitely doesn't help future readers. Now, you can claim that this is better this way (and I could even agree, especially since I don't know scraping), but I've already seen you today reason that your low-quality answer is only low-quality because it's what OP asked for. Please choose your choice of rationalization (or even better, don't give off-topic answers/answers to off-topic questions).
@AndrasDeak It's clear that you don't agree with my style. And it's your call to vote as you please (I don't have to tell you that. I'm only stating it so that you are aware that I understand). However, it is my opinion that this is helpful and can in fact be very useful to future readers. In fact, I spent a bit of time figuring out how to do it because prior to this, I didn't know how. It is my opinion that a downvote for honest and potentially useful work is punitive.
@AndrasDeak, i wouldn't consider this answer as a low-quality one - it simply gives OP a chance to look at their problem from the different viewpont. Please check well known "XY problem" - it happens all the time when people asking "wrong" questions. At the end OP wanted: converting data to pandas data frame - this answer did exactly that. Just my 0.02$...
@MaxU and I perfectly agree:) It's a high quality answer, and possibly an improvement over OP's original approach. The perspective that you might be missing is also in my comment: that this might be unhelpful (the weaker motive), and that piRSquared sometimes does the opposite: gives a low-effort answer to a clear XY case. I appreciate piRSquared's understanding, and we'll have to agree on disagreeing (which is why I didn't respond to the previous comment; I don't have much to add, and don't want to turn this into a pointless argument).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.