in python - how to save multi HTML Source code to one single text file

Question

I have list for Links (stored in links.txt file )

This code can save result of one link but I do not know how to make it download ALL the source codes of ALL links inside (links.txt) and SAVE THEM AS ONE SINGLE text file for next step of processing ...

import urllib.request    
urllib.request.urlretrieve("https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1", "result.txt")

Example links form links.txt

https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=2
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=3
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=4
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=5
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=6
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=7
....

assuming you arent trying to save as *.html, you can use a dict and serialize it using json module — Mensch
– Mensch, Commented Aug 1, 2020 at 19:53
Thank you for prompt reply dear @DelphiX , sadly I do not have much knowledge in python — user13602012
– user13602012, Commented Aug 1, 2020 at 19:55

Mensch · Accepted Answer · 2021-03-29 18:04:50Z

4

urllib

import urllib.request

with open('links.txt', 'r') as f:
    links = f.readlines()

for link in links:
    with urllib.request.urlopen(link) as f:
        # get html text
        html = f.read().decode('utf-8')

        # append html to file
        with open('result.txt', 'w+') as f:
            f.write(html)

requests

you could also use requests library which i find much more readable

pip install requests

import requests

with open('links.txt', 'r') as f:
    links = f.readlines()

for link in links:
    response = requests.get(link)
    html = response.text

    # append html to file
    with open('result.txt', 'w+') as f:
        f.write(html)

Use loop for page navigation

Use for loop to generate page links as the only thing that is changing is the page no.

links = [
  f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'
  for n in range(1, 10) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
]

or as you go along

for n in range(1, 10):
  link = f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'

  [...]

edited Mar 29, 2021 at 18:04

answered Aug 1, 2020 at 20:04

Mensch

7008 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user13602012 Over a year ago

looking good and import urllib.request work whiteout error , but still it give result of 1 single link instead all links inside (links.txt)

user13602012 Over a year ago

import requests give error ,, maybe it is safier to use urllib.request //// error with import requests is line 8, in <module> f.write(html) TypeError: write() argument must be str, not bytes

Mensch Over a year ago

requests is a module, you have to install it using pip

Mensch Over a year ago

requests is good such that we wont have to worry about formatting from utf-8 to ascii, as the module does it for you

user13602012 Over a year ago

yes requests module already installed ,

Requirement already satisfied: certifi>=2017.4.17 in c:\users\a-data\appdata\loc al\programs\python\python38-32\lib\site-packages (from requests) (2020.4.5.1)

|

Andrej Zacharevicz · Accepted Answer · 2020-08-04 17:10:26Z

2

Actually, it's usual better to use requests lib, so you should start from installing it:

pip install requests

Then I'd propose to read the links.txt line by line, download all the data you need and store it in file output.txt:

import requests

data = []
# collect all the data from all links in the file 
with open('links.txt', 'r') as links:
    for link in links:
        response = requests.get(link)
        data.append(response.text)

# put all collected to a single file
with open('output.txt', 'w+') as output:
    for chunk in data:
        print(chunk, file=output)

edited Aug 4, 2020 at 17:10

answered Aug 1, 2020 at 20:22

Andrej Zacharevicz

3971 silver badge11 bronze badges

3 Comments

user13602012 Over a year ago

sadly give error , line 8, in <module> data.appent(response.text) AttributeError: 'list' object has no attribute 'appent'

Mensch Over a year ago

data.append(response.text)

Andrej Zacharevicz Over a year ago

@user13602012 I fix the typo

Collectives™ on Stack Overflow

in python - how to save multi HTML Source code to one single text file

2 Answers 2

urllib

requests

Use loop for page navigation

6 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

urllib

Use loop for page navigation

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related