Here's the complete HTML Code of the page that I'm trying to scrape so please take a look first https://codepen.io/bendaggers/pen/LYpZMNv
As you can see, this is the page source of mbasic.facebook.com.
What I'm trying to do is scrape all the anchor tags that have a pattern like this:
Example
<a class="cf" href="/profile.php?id=100044454444312&fref=fr_tab">
Example with wild card.
<a class="cf" href="*">
so I decided to add a wild card identifier after href="*" since the value are dynamic.
Here's my (not working) Python Code.
driver.get('https://mbasic.facebook.com/cheska.cabral.796/friends')
pagex = re.compile(driver.page_source)
pattern = "<a class=\"cf\" href=\"*\">"
print(pagex.findall(pattern))
Note that in the page, there are several patterns like this so I need to capture all and print it.
<td class="w n" style="vertical-align: middle"><img src="https://scontent.fceb2-1.fna.fbcdn.net/v/t1.0-1/cp0/e15/q65/p50x50/79342209_112439723581175_5245034566049071104_o.jpg?_nc_cat=108&_nc_sid=dbb9e7&efg=eyJpIjoiYiJ9&_nc_ohc=lADKURnNsk4AX8WTS1F&_nc_ht=scontent.fceb2-1.fna&_nc_tp=3&oh=96f40cb2f95acbcfe9f6e4dc6cb31161&oe=5EC27AEB" class="bo s" alt="Natividad Cruz, profile picture" /></td>
<td class="w t" style="vertical-align: middle"><a class="cf" href="/profile.php?id=100044454444312&fref=fr_tab">Natividad Cruz</a>
<td class="w n" style="vertical-align: middle"><img src="https://scontent.fceb2-1.fna.fbcdn.net/v/t1.0-1/cp0/e15/q65/p50x50/10306248_10201945477974508_4213924286888352892_n.jpg?_nc_cat=109&_nc_sid=dbb9e7&efg=eyJpIjoiYiJ9&_nc_ohc=Z2daQ-qGgpsAX8BmLKr&_nc_ht=scontent.fceb2-1.fna&_nc_tp=3&oh=22f2b487166a7cd06e4ff650af4f7a7b&oe=5EC34325" class="bo s" alt="John Vinas, profile picture" /></td>
<td class="w t" style="vertical-align: middle"><a class="cf" href="/john.vinas?fref=fr_tab">John Vinas</a>
My goal is to print or findall the anchor tags and display it in terminal. Appreciate your help on this. Thank you!
Tried another set of code but no luck :)
driver.get('https://mbasic.facebook.com/cheska.cabral.796/friends')
pagex = driver.page_source
pattern = "<td class=\".*\" style=\"vertical-align: middle\"><a class=\".*\">"
x = re.findall(pattern, pagex)
print(x)