2

I have a python script A.pyand it takes arguments with a target file with some list of IPs and outs a CSV file with Information found regarding the IPs from some sources.( Run Method : python A.py Input.txt -c Output.csv ).

It took ages to get the work done. Later,I split Input file ( split -l 1000 Input.txt) -> created directories ( 10 directories) -> executed the script with the Input splitted in 10 directories parallel in screen mode

How to do this kind of jobs efficiently ? Any suggestions please ?

3
  • you can use python threads for your task Commented Dec 1, 2015 at 6:17
  • 1
    docs.python.org/2/library/thread.html Commented Dec 1, 2015 at 6:17
  • Thanks Alex will have a look into it ! Commented Dec 1, 2015 at 8:17

1 Answer 1

1

Try this:

parallel --round --pipepart -a Input.txt --cat python A.py {} -c {#}.csv

If A.py can read from a fifo then this is more efficient:

parallel --round --pipepart -a Input.txt --fifo python A.py {} -c {#}.csv

If your disk has long seek times then it might be faster to use --pipe instead of --pipepart.

Sign up to request clarification or add additional context in comments.

3 Comments

@ ole Thanks for reply.I cannot see any process running.I get some prompt on the screen " parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit " and What does {#}.csv mean ! Does it mean any CSV file in the directory ?
{#} is the substitution string for the job number. So it will pass 1.csv to the first job, 2.csv to the next and so on.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.