I have a code that uses a while loop to print whatever is included in the <a href> and the </a> tags of a webpage. I can extract the required Indexes and whatever is written in-between them and can also print it. The program is supposed to print the url only once, and then move on and increase the Index until it finds the next Index value for <a href> and </a>, print whatever's in-between them and continue to do so until the end of the string, printing every new url found on a separate line. Here's the code:
text = """ohsfhskfheifhsefis <a href = "fdnsfjsnfsnfns snkfsndfskj"</a>
<a href = "snfksnfsdf"</a>"""
index = 0
a = 0
b = 0
while index < len(text):
a = text.find('href', index)
b = text.find('/a', index)
print(text[a:b])
index = index + 2
if index >= len(text):
print("End")
break
However, when I run the program, it malfunctions as shown in the images.
Clearly the logic I'm using is wrong here. I know there are other easier ways to accomplish this task but I haven't got to the more complex stuff as I only recently started learning Python and would like to do it this way for now.
On the left is the first part of the Program. On the right is the second.
You can also clearly see the blank spaces being left out because the Program prints the url at every increment of the index.
Any kind of help would be greatly appreciated.