Parallel Processing Issue in Python

Question

I have a python script A.pyand it takes arguments with a target file with some list of IPs and outs a CSV file with Information found regarding the IPs from some sources.( Run Method : python A.py Input.txt -c Output.csv ).

It took ages to get the work done. Later,I split Input file ( split -l 1000 Input.txt) -> created directories ( 10 directories) -> executed the script with the Input splitted in 10 directories parallel in screen mode

How to do this kind of jobs efficiently ? Any suggestions please ?

you can use python threads for your task

Alex Kashin
– Alex Kashin

2015-12-01 06:17:13 +00:00
Commented Dec 1, 2015 at 6:17 — Alex Kashin
– Alex Kashin, Commented Dec 1, 2015 at 6:17
docs.python.org/2/library/thread.html

Alex Kashin
– Alex Kashin

2015-12-01 06:17:29 +00:00
Commented Dec 1, 2015 at 6:17 — Alex Kashin
– Alex Kashin, Commented Dec 1, 2015 at 6:17
Thanks Alex will have a look into it !

Arun
– Arun

2015-12-01 08:17:33 +00:00
Commented Dec 1, 2015 at 8:17 — Arun
– Arun, Commented Dec 1, 2015 at 8:17

Ole Tange · Accepted Answer · 2015-12-01 19:57:26Z

1

Try this:

parallel --round --pipepart -a Input.txt --cat python A.py {} -c {#}.csv

If A.py can read from a fifo then this is more efficient:

parallel --round --pipepart -a Input.txt --fifo python A.py {} -c {#}.csv

If your disk has long seek times then it might be faster to use --pipe instead of --pipepart.

answered Dec 1, 2015 at 19:57

Ole Tange

34.1k9 gold badges93 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Arun Over a year ago

@ ole Thanks for reply.I cannot see any process running.I get some prompt on the screen " parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit " and What does {#}.csv mean ! Does it mean any CSV file in the directory ?

Ole Tange Over a year ago

Are you being hit by stackoverflow.com/questions/16448887/…

Ole Tange Over a year ago

{#} is the substitution string for the job number. So it will pass 1.csv to the first job, 2.csv to the next and so on.

Collectives™ on Stack Overflow

Parallel Processing Issue in Python

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related