Need to write scraped data into csv file (threading)

Question

Here is my code:

from download1 import download
import threading,lxml.html
def getInfo(initial,ending):
    for Number in range(initial,ending):
        Fields = ['country', 'area', 'population', 'iso', 'capital', 'continent', 'tld', 'currency_code',
                  'currency_name', 'phone',
                  'postal_code_format', 'postal_code_regex', 'languages', 'neighbours']
        url = 'http://example.webscraping.com/places/default/view/%d'%Number
        html=download(url)
        tree = lxml.html.fromstring(html)
        results=[]
        for field in Fields:
            x=tree.cssselect('table > tr#places_%s__row >td.w2p_fw' % field)[0].text_content()
            results.append(x)#should i start writing here?
downloadthreads=[]
for i in range(1,252,63): #create 4 threads
    downloadThread=threading.Thread(target=getInfo,args=(i,i+62))
    downloadthreads.append(downloadThread)
    downloadThread.start()

for threadobj in downloadthreads:
    threadobj.join() #end of each thread

print "Done"

So results will have the values of Fields ,I need to write the data with Fields as top row (only once) then the values in results into CSV file. I am not sure i can open the file in the function because threads will open the file multiple times simultaneously.

Note: i know threading isn't desirable when crawling but i am just testing

have you tried keeping the file open and then just append on it? — wishmaster
– wishmaster, Commented Feb 26, 2019 at 17:32

Janekx · Accepted Answer · 2019-02-26 17:53:41Z

1

I think you should consider using some kind of queuing or thread pools. Thread pools are really useful if you want create several threads (not 4, I think you would use more than 4 threads, but 4 threads at a time).

An example of Queue technique can be found here.

Of course, you can label the files with its threads id, for example: "results_1.txt", "results_2.txt" and so on. Then, you can merge them after all threads finished.

You can use the basic concepts of Lock, Monitor, and so on, however I am not the biggest fan of them. An example of locking can be found here

edited Feb 26, 2019 at 17:53

answered Feb 26, 2019 at 17:37

Janekx

6617 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Janekx Over a year ago

Added some examples to my original answer.

timmy turner Over a year ago

if you can edit my code to do it that would be great,these examples seems pretty hard to understand then modify to my actual code

Collectives™ on Stack Overflow

Need to write scraped data into csv file (threading)

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related