0

I am new to Python and can use a little help. I am trying to write a script that will go out to a specific web site and download multiple .gif images in different spots at that site. Can anyone assist me in the right direction to take. This is the first one I have tried to make.

Here is what i got so far.

from http:// import http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys

def main(url, out_folder="C:\Users\jerry\Desktop\Heli/"):
"""Downloads all the images at 'url' to /test/"""
http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))

for image in         http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/.findAll("gif"):
    print "gif: %(src)s" % image
    filename = gif["src"].split("/")[-1]
    parsed[2] = gif["src"]
    outpath = os.path.join(out_folder, filename)
    if gif["src"].lower().startswith("http"):
        urlretrieve(gif["src"], outpath)
    else:
        urlretrieve(urlparse.urlunparse(parsed), outpath)

def _usage():
print "usage: python dumpimages.py      http:folkworm.ceri.memphis.edu/heli/heli_bb_ag/ [outpath]"

if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
    out_folder = sys.argv[-1]
    url = sys.argv[-2]
    if not url.lower().startswith("http"):
        _usage()
        sys.exit(-1)
main(url, out_folder)
4
  • What's wrong with the code that you posted? Commented Jan 27, 2017 at 16:07
  • What your question? This is far too broad as is. Commented Jan 27, 2017 at 16:14
  • Read this and reframe your question? We can't tell exactly what it is you are struggling with in the code. Commented Jan 27, 2017 at 16:27
  • Good for you for trying out something before posting the question here. Some libraries that could help are "requests", "scrapy", or "beautiful soup" Commented Jan 27, 2017 at 16:30

1 Answer 1

1

Here is the basic idea.

>>> import requests
>>> from bs4 import BeautifulSoup
>>> item = requests.get('http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/')
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
...     if '.gif' in link.attrs['href']:
...         print (link.attrs['href'])
...         break
...     
CCAR_HHZ_AG_00.2017012700.gif?v=1485534942

The break statement is there just to interrupt the script so that it doesn't print all of the names of the gif's. The next step would be to add code to that loop to concatenate the URL mentioned in the requests.get to the name of each gif and do a requests.get for it. This time though you would do, say, image = item.content to get the image in bytes, which you could write to a file of your choice.

EDIT: Fleshed out. Note you still need to arrange to provide one file name for each output file.

>>> import requests
>>> from bs4 import BeautifulSoup
>>> URL = 'http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/'
>>> item = requests.get(URL)
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
...     if '.gif' in link.attrs['href']:
...         print (link.attrs['href'])
...         pic = requests.get(URL + link.attrs['href'])
...         image = pic.content
...         open('pic.gif', 'wb').write(image)
...         break
...     
CCAR_HHZ_AG_00.2017012700.gif?v=1485535857
100846
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.