I'm attempting to scrape weather data from weatherunderground and using the multiprocessing.dummy library to run my requests through different threads. I'm getting an error when running the following code and I was wondering whether someone could walk me through what's going on and a possible solution. Note: my code could be wildly off.
from bs4 import BeautifulSoup # HTML Text Parsing Package
from urllib2 import urlopen # Package to read URLs
import requests # Package to actually request URL
import nltk
import re
import itertools as ite
import pandas as pd
def scrape(urls):
actual_temp = []
string = requests.get(URL)
soup = BeautifulSoup(string)
actual_temp_tag = soup.find_all(class_ = "wx-value")[0]
actual_temp.append(actual_temp_tag.string)
return actual_temp
URLs = []
for j in range(1,2):
for i in range(1,32):
SUB_URL = 'http://www.wunderground.com/history/airport/KBOS/2014/' + str(j) + '/' + str(i) + '/' + '/DailyHistory.html'
URLs.append(SUB_URL)
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(8)
results = pool.map(scrape, URLs)
pool.close()
pool.join()
The following is the error message I'm getting:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\bwei\Downloads\WinPython-64bit-2.7.9.4\python-2.7.9.amd64\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Users\bwei\Downloads\WinPython-64bit-2.7.9.4\python-2.7.9.amd64\lib\multiprocessing\pool.py", line 558, in get
raise self._value
TypeError: object of type 'Response' has no len()
In addition once my program has executed, how do I close all of the threads? I noticed that after trying my % of available memory goes up but doesn't go back down after running
Tracebackmessagestring = requests.get(URL)requests.getreturns a response object, not a string.