Having difficulties with my python script

Question

I am new to Python and can use a little help. I am trying to write a script that will go out to a specific web site and download multiple .gif images in different spots at that site. Can anyone assist me in the right direction to take. This is the first one I have tried to make.

Here is what i got so far.

from http:// import http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys

def main(url, out_folder="C:\Users\jerry\Desktop\Heli/"):
"""Downloads all the images at 'url' to /test/"""
http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))

for image in         http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/.findAll("gif"):
    print "gif: %(src)s" % image
    filename = gif["src"].split("/")[-1]
    parsed[2] = gif["src"]
    outpath = os.path.join(out_folder, filename)
    if gif["src"].lower().startswith("http"):
        urlretrieve(gif["src"], outpath)
    else:
        urlretrieve(urlparse.urlunparse(parsed), outpath)

def _usage():
print "usage: python dumpimages.py      http:folkworm.ceri.memphis.edu/heli/heli_bb_ag/ [outpath]"

if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
    out_folder = sys.argv[-1]
    url = sys.argv[-2]
    if not url.lower().startswith("http"):
        _usage()
        sys.exit(-1)
main(url, out_folder)

Read this and reframe your question? We can't tell exactly what it is you are struggling with in the code. — Fruitspunchsamurai
– Fruitspunchsamurai, Commented Jan 27, 2017 at 16:27
Good for you for trying out something before posting the question here. Some libraries that could help are "requests", "scrapy", or "beautiful soup" — Back2Basics
– Back2Basics, Commented Jan 27, 2017 at 16:30

Bill Bell · Accepted Answer · 2017-01-27 16:43:12Z

Here is the basic idea.

>>> import requests
>>> from bs4 import BeautifulSoup
>>> item = requests.get('http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/')
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
...     if '.gif' in link.attrs['href']:
...         print (link.attrs['href'])
...         break
...     
CCAR_HHZ_AG_00.2017012700.gif?v=1485534942

The break statement is there just to interrupt the script so that it doesn't print all of the names of the gif's. The next step would be to add code to that loop to concatenate the URL mentioned in the requests.get to the name of each gif and do a requests.get for it. This time though you would do, say, image = item.content to get the image in bytes, which you could write to a file of your choice.

EDIT: Fleshed out. Note you still need to arrange to provide one file name for each output file.

>>> import requests
>>> from bs4 import BeautifulSoup
>>> URL = 'http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/'
>>> item = requests.get(URL)
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
...     if '.gif' in link.attrs['href']:
...         print (link.attrs['href'])
...         pic = requests.get(URL + link.attrs['href'])
...         image = pic.content
...         open('pic.gif', 'wb').write(image)
...         break
...     
CCAR_HHZ_AG_00.2017012700.gif?v=1485535857
100846

Collectives™ on Stack Overflow

Having difficulties with my python script

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related