Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ scale_detections function added to adjust bbox,masks,obb for scaled images #1711

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

onuralpszr
Copy link
Collaborator

@onuralpszr onuralpszr commented Dec 5, 2024

Description

Instead of just handling xyxy values based on previous PR #1708, I approached with Detections object and I handle also mask and obb translations so that it can work for all cases

Test Cases

Test Seg and Bbox Case

import numpy as np
from ultralytics import YOLO
from PIL import Image
import cv2
import supervision as sv


model = YOLO("yolo11n-seg.pt")

image = Image.open("bus.jpg")


def detect_objects(image):
    results = model([image])[0]
    detections = sv.Detections.from_ultralytics(results)
    return detections


def annotate_img(image, dets):
    box_annotator = sv.BoxAnnotator()
    ann_img = box_annotator.annotate(image.copy(), dets)

    return ann_img


wo_letterbox_detections = detect_objects(image)
box_annotator = sv.BoxAnnotator()
seg_annotator = sv.MaskAnnotator()
wo_ann_img = box_annotator.annotate(image.copy(), wo_letterbox_detections)
wo_ann_img = seg_annotator.annotate(wo_ann_img, wo_letterbox_detections)

## Process with letterbox
letterbox_resolution = (640, 640)
resolution_wh = (image.width, image.height)
letterboxed_image = sv.letterbox_image(image, letterbox_resolution)
letterbox_detections = detect_objects(letterboxed_image)

scaled_detections = sv.scale_detections(letterbox_detections, letterbox_resolution, resolution_wh)

box_annotator = sv.BoxAnnotator()
seg_annotator = sv.MaskAnnotator()
with_ann_img = box_annotator.annotate(image.copy(), scaled_detections)
with_ann_img = seg_annotator.annotate(with_ann_img, scaled_detections)
with_ann_img = sv.pillow_to_cv2(with_ann_img)


wo_ann_img = sv.pillow_to_cv2(wo_ann_img)
viz_img = np.hstack([wo_ann_img, with_ann_img])
cv2.imshow("Detection", viz_img)
cv2.waitKey(0)

image

Test OBB Case

import numpy as np
from ultralytics import YOLO
from PIL import Image
import cv2
import supervision as sv


model = YOLO("yolo11m-obb.pt")

image = Image.open("boats.jpg")


def detect_objects(image):
    results = model([image])[0]
    detections = sv.Detections.from_ultralytics(results)
    return detections


def annotate_img(image, dets):
    box_annotator = sv.BoxAnnotator()
    ann_img = box_annotator.annotate(image.copy(), dets)

    return ann_img

box_annotator = sv.BoxAnnotator()
seg_annotator = sv.MaskAnnotator()
obb_annotator = sv.OrientedBoxAnnotator(color=sv.ColorPalette.from_hex(["#00FF00"]))


wo_letterbox_detections = detect_objects(image)

wo_ann_img = box_annotator.annotate(image.copy(), wo_letterbox_detections)
wo_ann_img = seg_annotator.annotate(wo_ann_img, wo_letterbox_detections)
wo_ann_img = obb_annotator.annotate(wo_ann_img, wo_letterbox_detections)

## Process with letterbox
letterbox_resolution = (400, 720)
resolution_wh = (image.width, image.height)
letterboxed_image = sv.letterbox_image(image, letterbox_resolution)
letterbox_detections = detect_objects(letterboxed_image)

scaled_detections = sv.scale_detections(letterbox_detections, letterbox_resolution, resolution_wh)

obb_annotator = sv.OrientedBoxAnnotator(color=sv.ColorPalette.from_hex(["#FFFFFF"]))
with_ann_img = box_annotator.annotate(image.copy(), scaled_detections)
with_ann_img = seg_annotator.annotate(with_ann_img, scaled_detections)
with_ann_img = obb_annotator.annotate(with_ann_img, scaled_detections)
with_ann_img = sv.pillow_to_cv2(with_ann_img)


wo_ann_img = sv.pillow_to_cv2(wo_ann_img)
viz_img = np.hstack([wo_ann_img, with_ann_img])
cv2.imshow("Detection", viz_img)
cv2.waitKey(0)

image

    bounding box coordinates and masks and obb for scaled images

Signed-off-by: Onuralp SEZER <[email protected]>
from supervision.detection.core import ORIENTED_BOX_COORDINATES, Detections


def scale_detections(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onuralpszr can we change function name to specific to letterbox otherwise it will confuse users?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hardikdava scale_letterbox_detections ? (maybe) what do you think @LinasKo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-letterboxing feels like a very narrow use case.

We should provide general functions which also enable letterbox reversal, and highlight that case in the examples & docs.

While it might be frustrating, let me take the time to review this in-depth over the weekend & start of next week, for reversing API decisions is hard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinasKo I can add enum parameter and change other two to src_res and target_res params and based on enum parameter it can be letterbox depended scale or normal scale, I can also add other extra cases to handle normal scale cases without "padding" ?

hardikdava
hardikdava previously approved these changes Dec 5, 2024
@hardikdava
Copy link
Collaborator

@onuralpszr a minor change required. Everything else looks good.

Copy link
Contributor

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a deeper look at the PR.

@SkalskiP, I need your thoughts on this.

  1. This PR submits the first function to detection_utils.py - utils for operations on entire detections objects, rather than their components. We'll move more utils here soon.
  2. The PR makes a common operation simpler:
  3. Letterbox an image
  4. Send to a model
  5. (new) un-letterbox detections, to fit original image
  6. I'm suggesting we reuse some utils we had (e.g. move_boxes), but cannot use others. For example, scale_boxes cannot be used, as it anchors to detection centers and scales them in-place, whereas here we assume the image changed size, so gaps between detections need to increase as well.

Because of the last point, splitting this entire function into multiple more general methods is tricky.

Ultimately, this is helpful for other projects, but very low value for us right now. Yet if we merge the wrong API, we'll struggle to remove it later.


boxes = detections.xyxy.copy()
boxes[:, [0, 2]] -= padding_left
boxes[:, [1, 3]] -= padding_top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use move_boxes from utils

mask = mask[
padding_top : padding_top + height_new,
padding_left : padding_left + width_new,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use move_masks despite #1715 - we know that padding will be positive.

if ORIENTED_BOX_COORDINATES in detections.data:
obbs = np.array(detections.data[ORIENTED_BOX_COORDINATES]).copy()
obbs[:, :, 0] -= padding_left
obbs[:, :, 1] -= padding_top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use move_oriented_boxes from utils.

from supervision.detection.core import ORIENTED_BOX_COORDINATES, Detections


def scale_detections(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming to either scale_detections_to_fill or undo_letterboxing_detections.

I'd like some advice from @SkalskiP here - I've explained more in the global review comment.


Returns:
Detections: A new Detections object with scaled to target resolution.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc should come with an example showing how to use this. It should make it evident that letterboxing can be undone using this operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also find a place in the docs where we could showcase the scenario of:

  1. Letterbox an image
  2. Send the appropriately sized image to the model
  3. Undo the letterboxing

Without this, discovery of the method will be very low.

@LinasKo
Copy link
Contributor

LinasKo commented Dec 6, 2024

Hey @hardikdava,

I took a deeper look. I know I asked for the PR, and it solves your problem, but we need time to think.

Most importantly, if we just add it as is, the discovery for it will be very low. I need to speak with @SkalskiP about this first.

@hardikdava
Copy link
Collaborator

@LinasKo Thanks for taking a look. I have already duplicated the function in other required project. So you can think from the future perspective and based only on supervision project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants