1

I need to reimplement following function which converts 8 digit List to 32 digit List via ByteBuffer to python. So it can be used a python pipeline. As a data scientist, I have been scratching my head how to do it right. I assume I can try to find a way to call java code in python as work around. But I think just implement in python is more elegant if it is achievable. Any help is appreciated!

static public List<Float> decodeEmbedding(List<Integer> embedding8d) {
    List<Float> embedding32d = new ArrayList<>();
    for (int index = 0; index < embedding8d.size(); ++index) {
        Integer x = embedding8d.get(index);
        byte[] bytes = ByteBuffer.allocate(4).putInt(x).array();
        for (int byteIndex = 0; byteIndex < 4; ++byteIndex) {
            embedding32d.add((float) bytes[byteIndex]);
        }
    }
    return embedding32d;
}

//------------ main ----------
List<Integer> input = Arrays.asList(
    -839660988,
    826572561,
    1885995405,
    819888698,
    -2010625315,
    -1614561409,
    -1220275962,
    -2135440498
);

System.out.println(decodeEmbedding(input));
//[-51.0, -13.0, -54.0, 68.0, 49.0, 68.0, 127.0, 17.0, 112.0, 106.0, 1.0, -115.0, 48.0, -34.0, -126.0, 58.0, -120.0, 40.0, 74.0, -35.0, -97.0, -61.0, -65.0, 127.0, -73.0, 68.0, 17.0, 6.0, -128.0, -73.0, -61.0, -114.0]

1 Answer 1

1

Here is one approach:

  1. We pack the integer into a bytes array of the format we want
import struct
bstring = struct.pack('>i', intvalue)
  1. Iterating over the bytes object gives us integers range [0, 256). But from the looks of it, you want signed range [-128, 128). So we adjust for this
for byte in bstring:
    if byte >= 128:
        byte -= 256
    embedding32d.append(float(byte))

To put the whole thing together:


import struct

def decodeEmbedding(embedding8d):
    embedding32d = []
    for intvalue in embedding8d:
        bstring = struct.pack('>i', intvalue)
        signed = (byte - 256 if byte >= 128 else byte for byte in bstring)
        embedding32d.extend(float(byte) for byte in signed)
    return embedding32d

There is also a one-liner version.

def decodeEmbedding(embedding8d):
    return [float(byte - 256 if byte > 127 else byte) for byte in
            struct.pack('>' + 'i' * len(embedding8d), *embedding8d)]
Sign up to request clarification or add additional context in comments.

1 Comment

That's a good solution, it's working i tested your code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.