0

I have written the following code to use regex to request pages, and look for strings that resemble interest rates. The overall code works; however, it is creating multiple empty dataframes and I can't get the code to drop the empty frames to clean up my output. I have been trying to use .dropna, .drop, and .empty to try and deprecate the dataframes but the output remains unchanged and keeps printing the empty dataframes with the information I have already. Is there an method I am not aware of that could get rid of these empty frames. Code and output below:

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = []
        matches.extend(re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string))
        sint = pd.Series(matches)
        qdate = pd.Series([datetime.datetime.now()]*len(sint))
        slink = pd.Series([link]*len(sint))
        df = pd.concat([qdate,sint,slink],axis=1)
        df.columns = ['Date','Interest Rate', 'URL']
        print(df)

Output:

  ...
0 ...
1 ...

[2 rows x 3 columns]
 ...
0 ...

[1 rows x 3 columns]
 ...
0 ...
1 ...
2 ...
3 ...

[4 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
  ...
0 ...

[1 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []

2 Answers 2

6

How about you just don't print/use the empty ones?

if df.empty:
  continue

Or

if not df.empty:
  print(df)
Sign up to request clarification or add additional context in comments.

2 Comments

Of course it is that simple. Thank you. Wow I feel stupid. Appreciate that
@dtrinh simple but not well known. I don’t think 🤔
0
if df.dropna(how='all').empty:
    continue

as per https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Series.empty.html a df with only nans will return False for .empty so if that matters good to use dropna first. You can use 'any' if having any NaN is too much or 'all' if you only want to drop a row/column if its all NaNs (probably what you want)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.