2

I'm trying to implement a multiprocessing version of object detection (video source can be both camera or video) with YOLO model of ultralytics.

I implemented a Queue where to add frames and a process pool with 4 workers: 1 shows the image and the other 3 process the frames.

Now, I have an issue:

When I start the program, the object detection works, but the video is not smooth and it seems "delayed"; with "delayed", I mean that, with respect to the original video source, it is slower like it has an high latency between every frame, thus turns slower than the original one. I'd expect the video to be smooth as the input source.

Any suggestion?

I already tried to vary the number of workers and maxsize of Queue, but it doesn't seem to be better.

from multiprocessing import Pool, Queue, Process, Lock
import cv2
from ultralytics import YOLO

stop_flag = False


def init_pool(d_b, selected_classes):
    global detection_buffer, yolo, selected_classes_set
    detection_buffer = d_b
    yolo = YOLO('yolov8n.pt')
    selected_classes_set = set(selected_classes)


def detect_object(frame, frame_id):
    global yolo, selected_classes_set
    results = yolo.track(frame, stream=False)
    for result in results:
        classes_names = result.names
        for box in result.boxes:
            if box.conf[0] > 0.4:
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                cls = int(box.cls[0])
                class_name = classes_names[cls]

                if class_name in selected_classes_set:
                    colour = (0, 255, 0)
                    cv2.rectangle(frame, (x1, y1), (x2, y2), colour, 2)
                    cv2.putText(frame, f'{class_name} {box.conf[0]:.2f}', (x1, y1),
                                cv2.FONT_HERSHEY_SIMPLEX, 1, colour, 2)
    detection_buffer.put((frame_id, frame))


def show(detection_buffer):
    global stop_flag
    next_frame_id = 0
    frames_buffer = {}
    while not stop_flag:
        data = detection_buffer.get()
        if data is None:
            break
        frame_id, frame = data
        frames_buffer[frame_id] = frame

        while next_frame_id in frames_buffer:
            cv2.imshow("Video", frames_buffer.pop(next_frame_id))
            next_frame_id += 1

            if cv2.waitKey(1) & 0xFF == ord('q'):
                stop_flag = True
                break

    cv2.destroyAllWindows()
    return


# Required for Windows:
if __name__ == "__main__":

    video_path = "path_to_video"
    detection_buffer = Queue(maxsize=3)

    selected_classes = ['car']

    detect_pool = Pool(3, initializer=init_pool, initargs=(detection_buffer, selected_classes))

    num_show_processes = 1
    show_processes = Process(target=show, args=(detection_buffer,))
    show_processes.start()

    if not video_path:
        cap = cv2.VideoCapture(0)
    else:
        cap = cv2.VideoCapture(video_path)

    frame_id = 0
    futures = []
    while not stop_flag:
        ret, frame = cap.read()
        if ret:
            f = detect_pool.apply_async(detect_object, args=(frame, frame_id))
            futures.append(f)
            frame_id += 1
        else:
            break

    for f in futures:
        f.get()

    for _ in range(num_show_processes):
        detection_buffer.put(None)

    for p in show_processes:
        p.join()

    detect_pool.close()
    detect_pool.join()

    cv2.destroyAllWindows()

1 Answer 1

1

The processing speed depends on a lot of factors, that we can't easily replicate. Among those:

  • CPU / GPU speed
  • RAM memory
  • Operating system

Having said that, the most limiting factor here is your input video, and the model you're using for object detection. For a typical video with 24 frames per second, you have 1/24 seconds to do whatever you're doing to any frame to have them display smoothly without delays or lags. That's about ~40ms per frame. For 60 FPS videos that's ~16ms per frame.

Although it seems tempting to just offload that to multiprocessing, doing that means you're using the CPU to do that instead of the GPU, which is usually much faster for such tasks. See Use GPU with opencv-python

Not to mention that doing things in multiprocessing is not "free", i.e 3 processes does not necessarily mean 3x speed, there's a lot of overhead cost to coordinate these tasks.

My advice would be to take the following steps:

  1. First thing - measure. How long does it actually take you to process a single frame for your typical video? How much does it take when you do it in one process vs. 3? Does it really make a difference?
  2. Process an "easier" video - less FPS, less resolution. Does that work better? Can you first convert your "better" video to that format if it does?
  3. Try offloading as much as you can to the GPU instead of the CPU as mentioned above.
  4. Try on different / better hardware. Do does lags exist because your computer is a potato, or do they exist even when running on a high-end PC?
  5. Try a different object detection model. Not all of them are designed to be fast. Some of them are designed for high precision but take a long time no matter what - which is usually better for processing a video but not for a live display of a streaming video. Which of these scenarios do you need / prefer?
Sign up to request clarification or add additional context in comments.

1 Comment

does you get any solotion for that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.