2

Here is my code:

from download1 import download
import threading,lxml.html
def getInfo(initial,ending):
    for Number in range(initial,ending):
        Fields = ['country', 'area', 'population', 'iso', 'capital', 'continent', 'tld', 'currency_code',
                  'currency_name', 'phone',
                  'postal_code_format', 'postal_code_regex', 'languages', 'neighbours']
        url = 'http://example.webscraping.com/places/default/view/%d'%Number
        html=download(url)
        tree = lxml.html.fromstring(html)
        results=[]
        for field in Fields:
            x=tree.cssselect('table > tr#places_%s__row >td.w2p_fw' % field)[0].text_content()
            results.append(x)#should i start writing here?
downloadthreads=[]
for i in range(1,252,63): #create 4 threads
    downloadThread=threading.Thread(target=getInfo,args=(i,i+62))
    downloadthreads.append(downloadThread)
    downloadThread.start()

for threadobj in downloadthreads:
    threadobj.join() #end of each thread

print "Done"

So results will have the values of Fields ,I need to write the data with Fields as top row (only once) then the values in results into CSV file. I am not sure i can open the file in the function because threads will open the file multiple times simultaneously.

Note: i know threading isn't desirable when crawling but i am just testing

1
  • have you tried keeping the file open and then just append on it? Commented Feb 26, 2019 at 17:32

1 Answer 1

1

I think you should consider using some kind of queuing or thread pools. Thread pools are really useful if you want create several threads (not 4, I think you would use more than 4 threads, but 4 threads at a time).

An example of Queue technique can be found here.

Of course, you can label the files with its threads id, for example: "results_1.txt", "results_2.txt" and so on. Then, you can merge them after all threads finished.

You can use the basic concepts of Lock, Monitor, and so on, however I am not the biggest fan of them. An example of locking can be found here

Sign up to request clarification or add additional context in comments.

2 Comments

Added some examples to my original answer.
if you can edit my code to do it that would be great,these examples seems pretty hard to understand then modify to my actual code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.