I’m working on a real-time system that identifies objects, but I’m facing a challenge: new objects can look extremely similar to known ones (sometimes differences are as small as 0.1 mm), and I need the system to detect when an object is truly new. Objects may appear in different positions and rotations, and I need to handle multiple objects in the same image.
Currently, I’m using YOLOv8, which works well for detecting and identifying pieces. However, since YOLO is primarily for detection and localization, I’m considering using a model like ResNet or VGG16 to extract visual features after YOLO detects each piece.
I’d love advice on whether this approach is considered good practice, or if there are better architectures or strategies for handling very similar objects while detecting unknown ones reliably.