4

When I tried to convert a very long integer to Int, I was surprised that no error was thrown:

Prelude> read "123456789012345678901234567890" :: Int
-4362896299872285998

readMaybe from Text.Read module gives the same result.

Two questions:

  • Which function should I call to perform a safe conversion?
  • How can the most type safe language on Earth allow such unsafe things?

Update 1:

This is my attempt to write a version of read that checks bounds:

{-# LANGUAGE ScopedTypeVariables #-}

parseIntegral :: forall a . (Integral a, Bounded a) => String -> Maybe a
parseIntegral s = integerToIntegral (read s :: Integer) where
  integerToIntegral n | n < fromIntegral (minBound :: a) = Nothing
  integerToIntegral n | n > fromIntegral (maxBound :: a) = Nothing
  integerToIntegral n = Just $ fromInteger n

Is it the best I can do?

2 Answers 2

3

Background: why unchecked overflow is actually wonderful

Haskell 98 leaves overflow behavior explicitly unspecified, which is good for implementers and bad for everyone else. Haskell 2010 discusses it in two sections—in the section inherited from Haskell 98, it's left explicitly unspecified, whereas in the sections on Data.Int and Data.Word, it is specified. This inconsistency will hopefully be resolved eventually.

GHC is kind enough to specify it explicitly:

All arithmetic is performed modulo 2^n, where n is the number of bits in the type.

This is an extremely useful specification. In particular, it guarantees that Int, Word, Int64, Word32, etc., form rings, and even principal ideal rings, under addition and multiplication. This means that arithmetic will always work right—you can transform expressions and equations in lots of different ways without breaking things. Throwing exceptions on overflow would break all these properties, making it much more difficult to write and reason about programs. The only times you really need to be careful are when you use comparison operators like < and compare—fixed width integers do not form ordered groups, so these operators are a bit touchy.

Why it makes sense not to check reads

Reading an integer involves many multiplications and additions. It also needs to be fast. Checking to make sure the read is "valid" is not so easy to do quickly. In particular, while it's easy to find out whether an addition has overflowed, it is not easy to find out whether a multiplication has. The only sensible ways I can think of to perform a checked read for Int are

  1. Read as an Integer, check, then convert. Integer arithmetic is significantly more expensive than Int arithmetic. For smaller things, like Int16, the read can be done with Int, checking for Int16 overflow, then narrowed. This is cheaper, but still not free.

  2. Compare the number in decimal to maxBound (or, for a negative number, minBound) while reading. This seems more likely to be reasonably efficient, but there will still be some cost. As the first section of this answer explains, there is nothing inherently wrong with overflow, so it's not clear that throwing an error is actually better than giving an answer modulo 2^n.

Sign up to request clarification or add additional context in comments.

11 Comments

Some minor comments: saying read needs to be fast is a bit odd, as read is generally considered quite slow. It is not easy to check whether an arbitrary multiplication has overflowed from within Haskell, but a multiplication by ten ought to be easy to check (just see if the sign changed).
@DanielWagner, you're right; read is considered slow. Perhaps we should fix that. And I guess you're probably right about the multiplication by ten, in this context, but I need to think about that a bit more.
It's true that Int is a modular arithmetic type but usually people choose it as a more efficient Integer, or because it is the type used by List functions, not because they actually want modular arithmetic behavior.
@ReidBarton, they want cheaper integers, yes, but I imagine they don't want to have to reassociate or distribute operations to avoid overflow exceptions!
They probably do want exceptions on overflow.
|
2

If isn't "unsafe", in that the behaviour of the problem isn't undefined. (It's perfectly defined, just probably not what you wanted.) For example, unsafeWriteAray is unsafe, in that if you make a mistake with it, it writes data into arbitrary memory locations, either causing your application to segfault, or merely making corrupt its own memory, causing it behave in arbitrary undefined ways.

So much for splitting hairs. If you want to deal with such huge numbers, Integer is the only way to go. But you probably knew that already.

As for why there's no overflow check... Sometimes you actually want a number to overflow. (E.g., you might convert to Word8 without explicitly ANDing out the bottom 8 bits.) At any rate, every possible arithmetic operation can potentially overflow (e.g., maxBound + 1 = minBound, and that's just normal addition.) Do you really want every single arithmetic operation to have an overflow check, slowing your program down at every step?

You get the exact same behaviour in C, C++ or C#. I guess the difference is, in C# we have the checked keyword, which allows you to automatically check for overflow. Maybe somebody has a Haskell package for doing checked arithmetic… For now, it's probably simpler to just implement this one check yourself.

5 Comments

It is unsafe. Please, give me an example of a real world application, where "123456789012345678901234567890" should be parsed to -4362896299872285998? There is no such application. Exact same behavior in C#? This is absolutely not true. In C# Convert.ToInt32 throws OverflowException. In Java Integer.ParseInt throws NumberFormatException.
@ZhekaKozlov 1. Nobody uses "12345678901234567890" with limited types such as Int32 in C# or Int in Haskell (at least when dealing with code which requires security). 2. This behavior is absolutely and completely type safe because type safety does not automatically enforce semantic correctness. For example you can create instances of type classes which then type check but do not enforce the rules defined for instances of said class.
"Incorrect" and "unsafe" are not the same thing. Your code is incorrect if it gives you the wrong answer. It is only unsafe if it can actually crash the whole application, or corrupt main memory so that other unrelated parts of the application malfunction.
Incorrect is usually worse than error, since at least if you get an error you know that something went wrong. Real undefined behavior can in principle look like a successful execution if you are unlucky, but usually it just results in a crash.
@ZhekaKozlov In C#, calling Convert.ToInt32() throws an exception. However, (int)myLong does not throw any exception. (One is a method call, the other is a type conversion. The method does a range-check; the typecast does not.) You are right, however, that (for example) int.TryParse() does a range check. Perhaps it would be helpful for Haskell's read method (only) to add this check...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.