Convert String to Int checking for overflow

Question

When I tried to convert a very long integer to Int, I was surprised that no error was thrown:

Prelude> read "123456789012345678901234567890" :: Int
-4362896299872285998

readMaybe from Text.Read module gives the same result.

Two questions:

Which function should I call to perform a safe conversion?
How can the most type safe language on Earth allow such unsafe things?

Update 1:

This is my attempt to write a version of read that checks bounds:

{-# LANGUAGE ScopedTypeVariables #-}

parseIntegral :: forall a . (Integral a, Bounded a) => String -> Maybe a
parseIntegral s = integerToIntegral (read s :: Integer) where
  integerToIntegral n | n < fromIntegral (minBound :: a) = Nothing
  integerToIntegral n | n > fromIntegral (maxBound :: a) = Nothing
  integerToIntegral n = Just $ fromInteger n

Is it the best I can do?

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

Background: why unchecked overflow is actually wonderful

Haskell 98 leaves overflow behavior explicitly unspecified, which is good for implementers and bad for everyone else. Haskell 2010 discusses it in two sections—in the section inherited from Haskell 98, it's left explicitly unspecified, whereas in the sections on Data.Int and Data.Word, it is specified. This inconsistency will hopefully be resolved eventually.

GHC is kind enough to specify it explicitly:

All arithmetic is performed modulo 2^n, where n is the number of bits in the type.

This is an extremely useful specification. In particular, it guarantees that Int, Word, Int64, Word32, etc., form rings, and even principal ideal rings, under addition and multiplication. This means that arithmetic will always work right—you can transform expressions and equations in lots of different ways without breaking things. Throwing exceptions on overflow would break all these properties, making it much more difficult to write and reason about programs. The only times you really need to be careful are when you use comparison operators like < and compare—fixed width integers do not form ordered groups, so these operators are a bit touchy.

Why it makes sense not to check reads

Reading an integer involves many multiplications and additions. It also needs to be fast. Checking to make sure the read is "valid" is not so easy to do quickly. In particular, while it's easy to find out whether an addition has overflowed, it is not easy to find out whether a multiplication has. The only sensible ways I can think of to perform a checked read for Int are

Read as an Integer, check, then convert. Integer arithmetic is significantly more expensive than Int arithmetic. For smaller things, like Int16, the read can be done with Int, checking for Int16 overflow, then narrowed. This is cheaper, but still not free.
Compare the number in decimal to maxBound (or, for a negative number, minBound) while reading. This seems more likely to be reasonably efficient, but there will still be some cost. As the first section of this answer explains, there is nothing inherently wrong with overflow, so it's not clear that throwing an error is actually better than giving an answer modulo 2^n.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Apr 12, 2015 at 15:43

dfeuer

48.8k5 gold badges69 silver badges171 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Daniel Wagner Over a year ago

Some minor comments: saying read needs to be fast is a bit odd, as read is generally considered quite slow. It is not easy to check whether an arbitrary multiplication has overflowed from within Haskell, but a multiplication by ten ought to be easy to check (just see if the sign changed).

dfeuer Over a year ago

@DanielWagner, you're right; read is considered slow. Perhaps we should fix that. And I guess you're probably right about the multiplication by ten, in this context, but I need to think about that a bit more.

Reid Barton Over a year ago

It's true that Int is a modular arithmetic type but usually people choose it as a more efficient Integer, or because it is the type used by List functions, not because they actually want modular arithmetic behavior.

dfeuer Over a year ago

@ReidBarton, they want cheaper integers, yes, but I imagine they don't want to have to reassociate or distribute operations to avoid overflow exceptions!

Reid Barton Over a year ago

They probably do want exceptions on overflow.

|

MathematicalOrchid · Accepted Answer · 2015-04-12 10:01:15Z

2

If isn't "unsafe", in that the behaviour of the problem isn't undefined. (It's perfectly defined, just probably not what you wanted.) For example, unsafeWriteAray is unsafe, in that if you make a mistake with it, it writes data into arbitrary memory locations, either causing your application to segfault, or merely making corrupt its own memory, causing it behave in arbitrary undefined ways.

So much for splitting hairs. If you want to deal with such huge numbers, Integer is the only way to go. But you probably knew that already.

As for why there's no overflow check... Sometimes you actually want a number to overflow. (E.g., you might convert to Word8 without explicitly ANDing out the bottom 8 bits.) At any rate, every possible arithmetic operation can potentially overflow (e.g., maxBound + 1 = minBound, and that's just normal addition.) Do you really want every single arithmetic operation to have an overflow check, slowing your program down at every step?

You get the exact same behaviour in C, C++ or C#. I guess the difference is, in C# we have the checked keyword, which allows you to automatically check for overflow. Maybe somebody has a Haskell package for doing checked arithmetic… For now, it's probably simpler to just implement this one check yourself.

answered Apr 12, 2015 at 10:01

MathematicalOrchid

63.1k21 gold badges131 silver badges231 bronze badges

5 Comments

ZhekaKozlov Over a year ago

It is unsafe. Please, give me an example of a real world application, where "123456789012345678901234567890" should be parsed to -4362896299872285998? There is no such application. Exact same behavior in C#? This is absolutely not true. In C# Convert.ToInt32 throws OverflowException. In Java Integer.ParseInt throws NumberFormatException.

ThreeFx Over a year ago

@ZhekaKozlov 1. Nobody uses "12345678901234567890" with limited types such as Int32 in C# or Int in Haskell (at least when dealing with code which requires security). 2. This behavior is absolutely and completely type safe because type safety does not automatically enforce semantic correctness. For example you can create instances of type classes which then type check but do not enforce the rules defined for instances of said class.

MathematicalOrchid Over a year ago

"Incorrect" and "unsafe" are not the same thing. Your code is incorrect if it gives you the wrong answer. It is only unsafe if it can actually crash the whole application, or corrupt main memory so that other unrelated parts of the application malfunction.

Reid Barton Over a year ago

Incorrect is usually worse than error, since at least if you get an error you know that something went wrong. Real undefined behavior can in principle look like a successful execution if you are unlucky, but usually it just results in a crash.

MathematicalOrchid Over a year ago

@ZhekaKozlov In C#, calling Convert.ToInt32() throws an exception. However, (int)myLong does not throw any exception. (One is a method call, the other is a type conversion. The method does a range-check; the typecast does not.) You are right, however, that (for example) int.TryParse() does a range check. Perhaps it would be helpful for Haskell's read method (only) to add this check...

Collectives™ on Stack Overflow

Convert String to Int checking for overflow

2 Answers 2

Background: why unchecked overflow is actually wonderful

Why it makes sense not to check reads

11 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Background: why unchecked overflow is actually wonderful

Why it makes sense not to check reads

11 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related