0

I have a code that uses a while loop to print whatever is included in the <a href> and the </a> tags of a webpage. I can extract the required Indexes and whatever is written in-between them and can also print it. The program is supposed to print the url only once, and then move on and increase the Index until it finds the next Index value for <a href> and </a>, print whatever's in-between them and continue to do so until the end of the string, printing every new url found on a separate line. Here's the code:

text = """ohsfhskfheifhsefis <a href = "fdnsfjsnfsnfns snkfsndfskj"</a>
<a href = "snfksnfsdf"</a>"""

index = 0

a = 0

b = 0

while index < len(text):

    a = text.find('href', index)

    b = text.find('/a', index)

    print(text[a:b])

    index = index + 2

    if index >= len(text):

        print("End")

        break

However, when I run the program, it malfunctions as shown in the images.

Clearly the logic I'm using is wrong here. I know there are other easier ways to accomplish this task but I haven't got to the more complex stuff as I only recently started learning Python and would like to do it this way for now.

On the left is the first part of the Program. On the right is the second.

You can also clearly see the blank spaces being left out because the Program prints the url at every increment of the index.

Any kind of help would be greatly appreciated.

0

2 Answers 2

1

Your search starts with index set to 0, then finds the href text at position 22. You then increment the index to 2, search again, and again find the text at position 22.

If you want to search to continue past the last match, you need to set index to a position after the last match instead:

index = a + 1

Now the next text.find() call starts searching at index 23 instead.

You'll also need to test if the text is not found:

if a < 0 or b < 0:
    break

Rather than manually search through text like this, consider using a HTML parser. Your search would be trivial with BeautifulSoup for example.

Sign up to request clarification or add additional context in comments.

1 Comment

@Catastrophe: break when a == -1 or b == -1.
0

An alternative suggestion:

for token in text.split('href="')[1:]:
    print token.split('"')[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.