0

If I use a single int to represent an ascii character set, how is using it reducing the storage space by a factor of 8 ? as compared to an array of 256 boolean values ? The single int is also functioning like a bit vector.

A boolean in java will occupy 1 bit as it can represent only true or false values. So for example if i have an array of boolean values. boolean[] char_set = new boolean[256] This will occupy 256 bits correct ? I'm reading that if i use a single int like a bit vector which means I can use 32 bits to cover 256 values. I guess that is reduction by a factor of 8. But why does the code below work ?

It is checking if there are any duplicates in the string. They are assuming an ascii char set. Str is some string.

int checker = 0;
for(int i=0;i<str.length();i++)
{
  int val = str.charAt(i) - 'a';
  if(checker& (1<<val)) > 0)
  {
     return false;
  }
  checker |= (1<<val);
}
  return true;
}

Can someone particularly explain how the bit vector logic works in this case. They are assuming the string contains lowercase characters.

3
  • 2
    Elaborate the question a little bit! It's not very clear what you're asking! Commented Sep 15, 2012 at 18:56
  • 1
    One word of advice is to use unicode to store strings and char data. Ascii is horribly outdated, and makes your code and programs less portable. Commented Sep 15, 2012 at 18:57
  • What do you mean by "represent a character set"? Commented Sep 15, 2012 at 19:06

3 Answers 3

2

An int is 32 bits, not 256 bits. It alone would not be enough to represent a set of 256 possible items. You need 8 of them. I am not sure what you mean that you can use just 32 bits then.

It's not clear what you're looping over -- what is str? All 256 values from 0 to 255? I'm suspicious because you are subtracting 'a'. Is your universe of values only 32 possible chars? Then sure you can use 32 bits. But where did 256 ever come from then?

Your mask condition needs to be != 0 to work for the highest bit set.

(A boolean's "real" size is opaque to the Java programmer. In reality, you will find that it is not 1 bit (machines aren't bit-addressable), nor even 1 byte. Java actually uses a whole 32-bit word. But this is not really related to your question.)

Sign up to request clarification or add additional context in comments.

4 Comments

You can represent 2^32 possible values with an int. And str is a string which can only contain values from the ascii char set.
Of course. But you seem to be using it as a bit set, to indicate the presence/absence of things. There are indeed 2^32 possible states of such a set, but, the universe of objects it can represent in the set has 32 elements. Your code isn't going to work because the shift can be larger than 32, which just overflows off the top of the int. Your mask sets nothing in that case.
Can you cite an example of it overflowing. It will be easier to understand then
What if the string contains 'A', which has ASCII value less than 'a'? The shift will be negative and throw an exception. ASCII technically doesn't incorporate values beyond 127, but I assume you are counting higher code points from something like ISO-8859-1. What if it contains 'ë'? The shift amount will be about 38. 1 << 38 ends up being 0. The mask would set no bits.
1

What the piece of code does is just "mark" a bit to signify the presence of a character.
In your case: int val = str.charAt(i) - 'a';. If the current character is a then val is equal to 0 so this line checker& (1<<val) checks if the zero bit is set (LSB). If it is then a has been seen before. Otherwise it sets it. If the current character is b then val will be equal to 1 so the next higher bit is set (first bit) and so on.
Essentially on the ascii charset just using a single int this way saves space as opossed to the boolean[256] array but this code can only handle the alphabet a-z while boolean[256] handles all of ASCII and the code will be clearer

2 Comments

Because you are using a 32 bit primitive as opposed to an array of 256 booleans which size is not 1 bit as you assume.They capture a 1 bit information but its real size is opaque to the program and implementation detail.I doubt that it is 1 bit under the hood.From docs.oracle.com/javase/tutorial/java/nutsandbolts/…: but its "size" isn't something that's precisely defined
@Phoenix Because it is only checking for lower case letters (which can be done in 32 bits (i.e. 26 < 32). It's not checking for the whole 256 characters in the ASCII character set.
0

A boolean in java will occupy 1 bit as it can represent only true or false values. So for example if i have an array of boolean values. boolean[] char_set = new boolean[256] This will occupy 256 bits correct ?

This is incorrect. Mondern computers cannot address a single bit.

Also, to represent ASCII characters, you only need 8 bits since 2^8 = 256 (where ^ means exponentiation).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.