Python program printing result multiple times

Question

I have a code that uses a while loop to print whatever is included in the <a href> and the </a> tags of a webpage. I can extract the required Indexes and whatever is written in-between them and can also print it. The program is supposed to print the url only once, and then move on and increase the Index until it finds the next Index value for <a href> and </a>, print whatever's in-between them and continue to do so until the end of the string, printing every new url found on a separate line. Here's the code:

text = """ohsfhskfheifhsefis <a href = "fdnsfjsnfsnfns snkfsndfskj"</a>
<a href = "snfksnfsdf"</a>"""

index = 0

a = 0

b = 0

while index < len(text):

    a = text.find('href', index)

    b = text.find('/a', index)

    print(text[a:b])

    index = index + 2

    if index >= len(text):

        print("End")

        break

However, when I run the program, it malfunctions as shown in the images.

Clearly the logic I'm using is wrong here. I know there are other easier ways to accomplish this task but I haven't got to the more complex stuff as I only recently started learning Python and would like to do it this way for now.

On the left is the first part of the Program. On the right is the second.

You can also clearly see the blank spaces being left out because the Program prints the url at every increment of the index.

Any kind of help would be greatly appreciated.

Martijn Pieters · Accepted Answer · 2016-10-22 13:24:46Z

1

Your search starts with index set to 0, then finds the href text at position 22. You then increment the index to 2, search again, and again find the text at position 22.

If you want to search to continue past the last match, you need to set index to a position after the last match instead:

index = a + 1

Now the next text.find() call starts searching at index 23 instead.

You'll also need to test if the text is not found:

if a < 0 or b < 0:
    break

Rather than manually search through text like this, consider using a HTML parser. Your search would be trivial with BeautifulSoup for example.

edited Oct 22, 2016 at 13:24

answered Oct 22, 2016 at 12:56

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Martijn Pieters Over a year ago

@Catastrophe: break when a == -1 or b == -1.

barak manos · Accepted Answer · 2016-10-22 13:00:52Z

0

An alternative suggestion:

for token in text.split('href="')[1:]:
    print token.split('"')[0]

answered Oct 22, 2016 at 13:00

barak manos

30.3k10 gold badges67 silver badges117 bronze badges

Collectives™ on Stack Overflow

Python program printing result multiple times

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related