Pytorch - torchvision model getting different results in python and kotlin

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 79 times

I have a torchvision model which is mobilenet without the classification head to use as a similarity search. I have saved it as a torchvision model. When I use it within python with the similarity search it gets the right result but within kotlin it does not- I have checked with using the same images and it does not get the same outputs. I am guessing it is todo with my preprocessing but I have tried everything I can find and it does not change anything. Here is my python code:

# Model and transform setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
image_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

def load_trained_model():
    model = torch.jit.load(MODEL_FILE, map_location=device)
    model.eval()
    return model

def extract_features(pil_img, model):
    with torch.no_grad():
        tensor = image_transform(pil_img).unsqueeze(0).to(device)
        features = model(tensor)
        if len(features.shape) > 2:
            features = features.view(features.size(0), -1)
        return features.cpu().numpy().astype(np.float32)
    enter code here

@app.post("/extract_features")
async def extract_image_features(image: UploadFile = File(...)):
    try:
        image_bytes = await image.read()
        with Image.open(BytesIO(image_bytes)) as img:
            processed_img = img.convert("RGB")

        raw_features = extract_features(processed_img, model)

Kotlin:

fun preprocessImage(bitmap: Bitmap): Tensor {
        val rgbBitmap = if (bitmap.config != Bitmap.Config.ARGB_8888) {
            bitmap.copy(Bitmap.Config.ARGB_8888, true)
        } else {
            bitmap
        }

        val resizedBitmap = resizeWithAspectRatio(rgbBitmap, 256)

        val croppedBitmap = centerCrop(resizedBitmap, 224, 224)

        val mean = floatArrayOf(0.485f, 0.456f, 0.406f)
        val std = floatArrayOf(0.229f, 0.224f, 0.225f)
        return TensorImageUtils.bitmapToFloat32Tensor(croppedBitmap, mean, std)
    }

    private fun resizeWithAspectRatio(bitmap: Bitmap, targetSize: Int): Bitmap {
        val width = bitmap.width
        val height = bitmap.height

        val scale = if (width < height) {
            targetSize.toFloat() / width.toFloat()
        } else {
            targetSize.toFloat() / height.toFloat()
        }

        val newWidth = (width * scale).roundToInt()
        val newHeight = (height * scale).roundToInt()

        // Use createScaledBitmap with bilinear filtering (matches PyTorch default)
        return bitmap.scale(256, 256)
    }

    private fun centerCrop(bitmap: Bitmap, targetWidth: Int, targetHeight: Int): Bitmap {
        val width = bitmap.width
        val height = bitmap.height

        val startX = (width - targetWidth) / 2
        val startY = (height - targetHeight) / 2

        val validStartX = Math.max(0, startX)
        val validStartY = Math.max(0, startY)

        val validTargetWidth = Math.min(targetWidth, width - validStartX)
        val validTargetHeight = Math.min(targetHeight, height - validStartY)

        return Bitmap.createBitmap(bitmap, validStartX, validStartY, validTargetWidth, validTargetHeight)
    }

    fun extractFeatures(bitmap: Bitmap): FloatArray {
        if (model == null) {
            throw IllegalStateException("Model not loaded. Call loadModel() first.")
        }

        val inputTensor =
            preprocessImage(bitmap)

        val output = model!!.forward(IValue.from(inputTensor))
        val outputTensor = output.toTensor()

        val features = outputTensor.dataAsFloatArray

        return features
    }

First 5 outputs from kotlin:

[0.7993497, 0.30109355, 0.32214138, 0.47712356, 0.5185487]

Python:

[ 1.2595854  -0.07939269 -0.3717999   0.22528967  0.12919804]

Why are the outputs so different?

edited May 14 at 23:40

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

asked May 13 at 11:52

URFMODEG

471 silver badge11 bronze badges

What datatype is the kotlin image? Typically for pytorch you are loading the image such that each pixel is a float value between 0 and 1. You might be loading the image with pixel values as ints between 0 and 255

Karl
– Karl

2025-05-15 19:45:10 +00:00
Commented May 15 at 19:45
I checked and the input tensors were the same - i just upgraded to Executorch and it worked!

URFMODEG
– URFMODEG

2025-05-16 11:15:53 +00:00
Commented May 16 at 11:15

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Pytorch - torchvision model getting different results in python and kotlin

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest