I have written the following code to use regex to request pages, and look for strings that resemble interest rates. The overall code works; however, it is creating multiple empty dataframes and I can't get the code to drop the empty frames to clean up my output. I have been trying to use .dropna, .drop, and .empty to try and deprecate the dataframes but the output remains unchanged and keeps printing the empty dataframes with the information I have already. Is there an method I am not aware of that could get rid of these empty frames. Code and output below:
plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
'https://www.marcus.com/us/en/personal-loans',
'https://www.discover.com/personal-loans/']
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
cdate = datetime.date.today()
l = r.get(link)
l.encoding = 'utf-8'
data = l.text
soup = bs(data, 'html.parser')
paragraph = soup.find_all(text=re.compile('[0-9]%'))
for n in paragraph:
matches = []
matches.extend(re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string))
sint = pd.Series(matches)
qdate = pd.Series([datetime.datetime.now()]*len(sint))
slink = pd.Series([link]*len(sint))
df = pd.concat([qdate,sint,slink],axis=1)
df.columns = ['Date','Interest Rate', 'URL']
print(df)
Output:
...
0 ...
1 ...
[2 rows x 3 columns]
...
0 ...
[1 rows x 3 columns]
...
0 ...
1 ...
2 ...
3 ...
[4 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
...
0 ...
[1 rows x 3 columns]
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []
Empty DataFrame
Columns: [Date, Interest Rate, URL]
Index: []