2

I am running an os.system(cmd) in a for-loop. Since sometimes it hangs, I am trying to use process=subprocess.pOpen(cmd) in a for-loop. But I want to know the following:

  • If I do sleep(60) and then check if the process is still running by using process.poll(), how do I differentiate between process actually running even after 1 minute and process that hung?

  • If I kill the process which hung, will the for-loop still continue or will it exit?

Thanks!

2 Answers 2

4

I don't know of any general way to tell whether a process is hung or working. If a process hangs due to a locking issue, then it might consume 0% CPU and you might be able to guess that it is hung and not working; but if it hangs with an infinite loop, the process might make the CPU 100% busy but not accomplish any useful work. And you might have a process communicating on the network, talking to a really slow host with long timeouts; that would not be hung but would consume 0% CPU while waiting.

I think that, in general, the only hope you have is to set up some sort of "watchdog" system, where your sub-process uses inter-process communication to periodically send a signal that means "I'm still alive".

If you can't modify the program you are running as a sub-process, then at least try to figure out why it hangs, and see if you can then figure out a way to guess that it has hung. Maybe it normally has a balanced mix of CPU and I/O, but when it hangs it goes in a tight infinite loop and the CPU usage goes to 100%; that would be your clue that it is time to kill it and restart. Or, maybe it writes to a log file every 30 seconds, and you can monitor the size of the file and restart it if the file doesn't grow. Or, maybe you can put the program in a "verbose" mode where it prints messages as it works (either to stdout or stderr) and you can watch those. Or, if the program works as a daemon, maybe you can actively query it and see if it is alive; for example, if it is a database, send a simple query and see if it succeeds.

So I can't give you a general answer, but I have some hope that you should be able to figure out a way to detect when your specific program hangs.

Finally, the best possible solution would be to figure out why it hangs, and fix the problem so it doesn't happen anymore. This may not be possible, but at least keep it in mind. You don't need to detect the program hanging if the program never hangs anymore!

P.S. I suggest you do a Google search for "how to monitor a process" and see if you get any useful ideas from that.

Sign up to request clarification or add additional context in comments.

Comments

0

A common way to detect things that have stopped working is to have them emit a signal at roughly regular intervals and have another process monitor the signal. If the monitor sees that no signal has arrived after, say, twice the interval it can take action such as killing and restarting the process.

This general idea can be used not only for software but also for hardware. I have used it to restart embedded controllers by simply charging a capacitor from an a.c. coupled signal from an output bit. A simple detector monitors the capacitor and if the voltage ever falls below a threshold it just pulls the reset line low and at the same time holds the capacitor charged for long enough for the controller to restart.

The principle for software is similar; one way is for the process to simply touch a file at intervals. The monitor checks the file modification time at intervals and if it is too old kills and restarts the process.

In OP's case the subprocess could write a status code to a file to say how far it has got in its work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.