Python Object detection multiprocessing speed issue

Question

I'm trying to implement a multiprocessing version of object detection (video source can be both camera or video) with YOLO model of ultralytics.

I implemented a Queue where to add frames and a process pool with 4 workers: 1 shows the image and the other 3 process the frames.

Now, I have an issue:

When I start the program, the object detection works, but the video is not smooth and it seems "delayed"; with "delayed", I mean that, with respect to the original video source, it is slower like it has an high latency between every frame, thus turns slower than the original one. I'd expect the video to be smooth as the input source.

Any suggestion?

I already tried to vary the number of workers and maxsize of Queue, but it doesn't seem to be better.

from multiprocessing import Pool, Queue, Process, Lock
import cv2
from ultralytics import YOLO

stop_flag = False


def init_pool(d_b, selected_classes):
    global detection_buffer, yolo, selected_classes_set
    detection_buffer = d_b
    yolo = YOLO('yolov8n.pt')
    selected_classes_set = set(selected_classes)


def detect_object(frame, frame_id):
    global yolo, selected_classes_set
    results = yolo.track(frame, stream=False)
    for result in results:
        classes_names = result.names
        for box in result.boxes:
            if box.conf[0] > 0.4:
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                cls = int(box.cls[0])
                class_name = classes_names[cls]

                if class_name in selected_classes_set:
                    colour = (0, 255, 0)
                    cv2.rectangle(frame, (x1, y1), (x2, y2), colour, 2)
                    cv2.putText(frame, f'{class_name} {box.conf[0]:.2f}', (x1, y1),
                                cv2.FONT_HERSHEY_SIMPLEX, 1, colour, 2)
    detection_buffer.put((frame_id, frame))


def show(detection_buffer):
    global stop_flag
    next_frame_id = 0
    frames_buffer = {}
    while not stop_flag:
        data = detection_buffer.get()
        if data is None:
            break
        frame_id, frame = data
        frames_buffer[frame_id] = frame

        while next_frame_id in frames_buffer:
            cv2.imshow("Video", frames_buffer.pop(next_frame_id))
            next_frame_id += 1

            if cv2.waitKey(1) & 0xFF == ord('q'):
                stop_flag = True
                break

    cv2.destroyAllWindows()
    return


# Required for Windows:
if __name__ == "__main__":

    video_path = "path_to_video"
    detection_buffer = Queue(maxsize=3)

    selected_classes = ['car']

    detect_pool = Pool(3, initializer=init_pool, initargs=(detection_buffer, selected_classes))

    num_show_processes = 1
    show_processes = Process(target=show, args=(detection_buffer,))
    show_processes.start()

    if not video_path:
        cap = cv2.VideoCapture(0)
    else:
        cap = cv2.VideoCapture(video_path)

    frame_id = 0
    futures = []
    while not stop_flag:
        ret, frame = cap.read()
        if ret:
            f = detect_pool.apply_async(detect_object, args=(frame, frame_id))
            futures.append(f)
            frame_id += 1
        else:
            break

    for f in futures:
        f.get()

    for _ in range(num_show_processes):
        detection_buffer.put(None)

    for p in show_processes:
        p.join()

    detect_pool.close()
    detect_pool.join()

    cv2.destroyAllWindows()

TylerH · Accepted Answer · 2025-01-29 20:00:38Z

The processing speed depends on a lot of factors, that we can't easily replicate. Among those:

CPU / GPU speed
RAM memory
Operating system

Having said that, the most limiting factor here is your input video, and the model you're using for object detection. For a typical video with 24 frames per second, you have 1/24 seconds to do whatever you're doing to any frame to have them display smoothly without delays or lags. That's about ~40ms per frame. For 60 FPS videos that's ~16ms per frame.

Although it seems tempting to just offload that to multiprocessing, doing that means you're using the CPU to do that instead of the GPU, which is usually much faster for such tasks. See Use GPU with opencv-python

Not to mention that doing things in multiprocessing is not "free", i.e 3 processes does not necessarily mean 3x speed, there's a lot of overhead cost to coordinate these tasks.

My advice would be to take the following steps:

First thing - measure. How long does it actually take you to process a single frame for your typical video? How much does it take when you do it in one process vs. 3? Does it really make a difference?
Process an "easier" video - less FPS, less resolution. Does that work better? Can you first convert your "better" video to that format if it does?
Try offloading as much as you can to the GPU instead of the CPU as mentioned above.
Try on different / better hardware. Do does lags exist because your computer is a potato, or do they exist even when running on a high-end PC?
Try a different object detection model. Not all of them are designed to be fast. Some of them are designed for high precision but take a long time no matter what - which is usually better for processing a video but not for a live display of a streaming video. Which of these scenarios do you need / prefer?

Collectives™ on Stack Overflow

Python Object detection multiprocessing speed issue

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related