6

How can I convert a non-numeric String to an Integer?

I got for instance:

String unique = "FUBAR";

What's a good way to represent the String as an Integer with no collisions e.g. "FUBAR" should always be represented as the same number and shan't collide with any other String. For instance, String a = "A"; should be represented as the Integer 1 and so on, but what is a method that does this (preferrably for all unicode strings, but in my case ASCII values could be sufficient).

7
  • 1
    er. this is what character encodings do. Get the bytes of a String, you have a number. Commented Nov 1, 2013 at 10:22
  • 1
    What's the goal here? There are any number of ways to convert a string to a number and maintain uniqueness. Since any data is, after all, stored as a series of bits, it's more of a reinterpretation than a conversion. But if you want the result for any string of any length to fit in a single Java int value, then you are looking for a hash function, of which there are many. However, there can never be a perfect one guaranteeing no collisions, since there are more possible strings than ints (pigeonhole principle). Commented Nov 1, 2013 at 10:24
  • 1
    I cannot think of a way that would work for all unicode strings, no matter how long, and convert them to a single int. But if you find a reliable way, come back and name your price: data compression companies are going to love you ;-) Commented Nov 1, 2013 at 10:25
  • 2
    Are you looking for stackoverflow.com/questions/2624192/…? Commented Nov 1, 2013 at 10:27
  • 1
    By "integer" do you mean a java int or do you mean "a whole number of arbitrary length"? Commented Nov 1, 2013 at 11:46

6 Answers 6

9

This is impossible. Think about it, an Integer can only be 32 bits. So, by the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion. In reality, there are infinite with the same values...

If you're just looking for an efficient mapping, then I suggest that you just use the int returned by hashCode(), which for reference is actually 31 bits.

Sign up to request clarification or add additional context in comments.

14 Comments

Downvoted because it is possible. Hexadecimal numbers contain characters and they can be easily converted to 10 base without any collisions.
@909Niklas what?? int idValue = (this.getClass().getName() + id).hashCode()
@Torben the question specifies "with no collisions". That's impossible.
@Torben There is no possible way to guarantee no collisions. If you find a way, please tell me (and no one else).
BTW Object.hashCode() is 31-bit.
|
3

You can map Strings to unique IDs using table. There is not way to do this generically.

final Map<String, Integer> map = new HashMap<>();
public int idFor(String s) {
    Integer id = map.get(s);
    if (id == null)
       map.put(s, id = map.size());
    return id;
}

Note: having unique id's doesn't guarantee no collisions in a hash collection.

http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html

Comments

2

If you know the character set used in your strings, then you can think of the string as number with base other than 10. For example, hexadecimal numbers contain letters from A to F.

Therefore, if you know that your strings only contain letters from an 8-bit character set, you can treat the string as a 256-base number. In pseudo code this would be:

number n;
for each letter in string
    n = 256 * n + (letter's position in character set)

If your character set contains 65535 characters, then just multiply 'n' with that number on each step. But beware, the 32 bits of an integer will be easily overflown. You probably need to use a type that can hold a larger number.

Comments

1
private BigDecimal createBigDecimalFromString(String data)
{
    BigDecimal value = BigDecimal.ZERO;

    try
    {
        byte[] tmp = data.getBytes("UTF-8");
        int numBytes = tmp.length;
        for(int i = numBytes - 1; i >= 0; i--)
        {
            BigDecimal exponent = new BigDecimal(256).pow(i);
            value = value.add(exponent.multiply(new BigDecimal(tmp[i])));
        }
    }
    catch (UnsupportedEncodingException e)
    {
    }
    return value;
}

Comments

1

Maybe a little bit late, but I'm going to give my 10 cents to simplify it (internally is similar to BigDecimal suggested by @Romain Hippeau)

public static BigInteger getNumberId(final String value) {
    return new BigInteger(value.getBytes(Charset.availableCharsets().get("UTF-8")));
}

Comments

1

Regardless of the accepted answer, it is possible to represent any String as an Integer by computing that String's Gödelnumber, which is a unique product of prime numbers for every possible String. With that being said it's quite impractical and slow to implement, also for most Strings you would need a BigInteger rather than a normal Integer and to decode a Gödelnumber into its corresponding String you need to have a defined Charset.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.