Python find matching URL for Text String

Question

Alright,

I got following Code working:

import re


with open('html.txt') as f:
    urls = f.read()
    links = re.findall('"((http)s?://.*?)"', urls)
for url in links:
    print(url[0].replace("#038;", "")) #Replace is for making invalid URL in a working one

HTML Textfile sample:

<td class="download-file" data-title="Download">
      <a href="https://URL.com/?download_file=259&#038;order=wc_order_xBxDxBxD&#038;emailtestmail%40gmail.com&#038;key=1234-1234-1234-1234-12345678" class="woocommerce-MyAccount-downloads-file button alt">
    INSTRUCTION</a>                 

</td>

Problem:

There are couple of those Links in the HTML.txt File i created.

I also have a List of strings that match the URL Text, example: [Instruction, File2, File3, etc...]

Now I would like to match the strings in the List with the matching URL in my .txt File.

Basicly I want to create a Second List, that has the URL's of the matching Strings

However its not important that I have a specific order in the List, I just want to make sure each String in my List [Instruction, File2, File3, etc...] finds his matching URL from the Textfile.

Really struggled alot and cant find a solution, so I really appreciate your help on this matter.

The output of my List = ['Instruction', 'File2', 'File3', ...] — PRR
– PRR, Commented May 8, 2020 at 15:32

John S · Accepted Answer · 2020-05-08 15:27:08Z

1

You may want to consider using the BeautifulSoup library to parse HTML files (I would also clarify that it looks like you are parsing a .html file, not a .txt file.) (Unfortunately I do not have enough reputation to comment.)

answered May 8, 2020 at 15:27

John S

714 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PRR Over a year ago

Hey there, initaliy im using BeautifulSoup, its just I couldnt get the HTML-Source AFTER Login... really tried it, didnt work out. thats why i just copy pasted the source in the txt file. Its intentional that im parsing a .txt file in this case.

Collectives™ on Stack Overflow

Python find matching URL for Text String

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related