Wrong output from custom big unsigned integer type when checking whether subtraction follow modular arithmetic

Question

Here I am tasked with creating an UnsignedBigInteger class that can handle numbers bigger than a long. Am allowed to use a byte array as the internal representation, uint8_t[].

What I came up with is a UnsignedBigInteger type storing the integer inside an array of uint8_t buckets. I work in 64bit architecture (-m64 flag to the compiler).

I ran into a problem while trying to code an operator- between two of my custom objects. We are dealing with unsigned integers, say a and b: when a < b, a - b should follow the rules of modular arithmetic (refer to this question for further info on modular arithmetic).

Here is my attempt:

#include <cstdio>
#include <cstdint>
#include <limits>
#include <cassert>

class UnsignedBigInteger {
public:
    constexpr static unsigned long long two_fifty_five{0xffULL}; // 0xff = 0d00000255 = 2^(8) - 1
    constexpr static unsigned long long max_ull{std::numeric_limits<unsigned long long int>::max()}; 

    UnsignedBigInteger() = default;
    UnsignedBigInteger(const unsigned long long value) {
        printf("value = %llu\n", value);
        unsigned long long shift{two_fifty_five};
        for (size_t i=0; i!=len; ++i) {
            _value[i] = (value & shift) >> (8 * i);
            printf("_value[%lu] in binary: %llb - shift: %llb\n", i, _value[i], shift);
            shift <<= 8;
        };
        // above is equivalent to:
        // _value[0] = value & 0xff;
        // _value[1] = value & (0xff << 8);
        // _value[2] = value & (0xff << 16); 
        // ... _value[6] = value & (0xff << 56);
    }

    unsigned long long int get() const {
        unsigned long long int result{};
        for (size_t i=0; i!=len; ++i) {
            result += (_value[i] << (8*i));
            printf("get result[%lu] = %llu, or in binary %llb\n", i, result, result);
        }

        return result;
    }

    UnsignedBigInteger operator-(const UnsignedBigInteger& addend) const {

        UnsignedBigInteger result{};
        
        for (size_t i=0; i!=len; ++i) {
            result._value[i] = (_value[i] - addend._value[i]) % two_fifty_five;
            (this->get() < addend.get()) && (result._value[i] == 0) ? 
                (result._value[i] = two_fifty_five) : true;
            printf("subtraction result, buckets: _value[%lu]: %llb\n", i, result._value[i]);
        }

        printf("subtraction result obtained with get(): %llu\n", result.get());

        return result; 
    }

private:
    constexpr static size_t len = 7;
    uint8_t _value[len];
};


int main() {
    // verify you are in 64-bit environment
    constexpr static unsigned long long max_ull{std::numeric_limits<unsigned long long int>::max()}; 
    assert(max_ull == 18'446'744'073'709'551'615ULL);  // = 2^64 - 1 = billions of billions
    printf("ULLONG_MAX is set for 64bit arithmetics: %llu - binary: %llb\n\n", max_ull, max_ull);

    printf("Define two UnsignedBigInteger objects: \n");
    UnsignedBigInteger a{13ULL}, b{27ULL};
    
    printf("\nSubtract them:\n");
    UnsignedBigInteger a_minus_b = a - b;
    printf("\nSUBTRACTION RESULT, FROM MAIN() PROGRAM, IS CORRECT:\n");
    printf("a - b = %llu --- in binary %llb\n\n", a_minus_b, a_minus_b);
    
    // should behave like this:
    printf("\nAttempting to reproduce modular arithmetic for unsigned integer subtraction, as below:\n");
    unsigned int five = 5, seven = 7;
    unsigned int aa = five - seven; 
    printf("uint aa = 5 - 7 = %u\n", aa); 
}

Here is the runnable version of MWE above on godbolt link.

It can be seen from the output console that the result of the subtraction is given correctly from the main program (follows modular arithmetic), while the printf from the get() function I set up during debugging seems unable to handle numbers higher than 24-bits (see total break of pattern in chunk of output below, get result[3]).

get result[0] = 242, or in binary 11110010
get result[1] = 65522, or in binary 1111111111110010
get result[2] = 16777202, or in binary 111111111111111111110010
get result[3] = 18446744073709551602, or in binary 1111111111111111111111111111111111111111111111111111111111110010
get result[4] = 241, or in binary 11110001
get result[5] = 65521, or in binary 1111111111110001
get result[6] = 16777201, or in binary 111111111111111111110001
subtraction result obtained with get(): 16777201

SUBTRACTION RESULT, FROM MAIN() PROGRAM, IS CORRECT:
a - b = 72057594037927922

I am puzzled because get() should be the output interface for UnsignedBigInteger. What I print in main is a unsigned long long, which is converted from UnsignedBigInteger implicitly as I did not even set up a type conversion operator.

Since you are using C++, you should use std::cout instead of printf. One wrong move with the printf format specifier can cause output to display differently (or cause undefined behavior). The godbolt link shows warnings about this. — PaulMcKenzie
– PaulMcKenzie, Commented Dec 30, 2024 at 1:30
So you’ve made progress, because 255-0 didn’t work correctly before. — Pete Becker
– Pete Becker, Commented Dec 30, 2024 at 3:07
@Giogre -- The warning at line 76 says: warning: format specifies type 'unsigned long long' but the argument has type 'UnsignedBigInteger' [-Wformat] -- That is very suspicious, as printf has no idea what UnsignedBigInteger is. Unless there was some type of casting operator defined in UnsignedBigInteger which I am missing, this usage of printf is undefined behavior. — PaulMcKenzie
– PaulMcKenzie, Commented Dec 30, 2024 at 4:00
@Giogre result += (_value[i] << (8 * i)); -- What if 8 * i exceeds the number of bits represent by a uint8_t? Shifting a value greater than the number of bits is undefined behavior. — PaulMcKenzie
– PaulMcKenzie, Commented Dec 30, 2024 at 12:38
@Giogre: In _value[i] << (8*i) the left operand is implicitly promoted to int (and the right operand is already size_t) — Ben Voigt
– Ben Voigt, Commented Dec 30, 2024 at 15:27

Giogre · Accepted Answer · 2024-12-31 00:16:31Z

0

As suggested in the comments by @PaulMcKenzie and confirmed by @BenVoigt, the problem laid within UnsignedBigInteger member function get(): components of _values[] - an array of uint8_t which is UnsignedBigInteger data member - are bitwise-shifted leftwards in line

result += (_value[i] << (8*i));

Here, type promotion rules kick in: everything on the right hand side ends up being an int, leading the left hand side to overflow once the bit-shifting reaches the sign bit (24th).

A revised version of the MWE code with godbolt link is below.

All previously appearing compiler warnings were addressed by use of std::cout in place of printf. Debugging lines were also commented out in order to produce cleaner output. In addition the get() function is substituted by a conversion operator operator unsigned long long int() sporting a static_cast<unsigned long long> inside, handling the conversion.

#include <cstdio>
#include <cstdint>
#include <limits>
#include <cassert>
#include <iostream>
#include <format>

class UnsignedBigInteger {
public:
    constexpr static unsigned long long two_fifty_five{0xffULL}; // 0xff = 0d00000255 = 2^(8) - 1
    constexpr static unsigned long long max_ull{std::numeric_limits<unsigned long long int>::max()}; 
    constexpr static unsigned long long two_fifty_six{0x100ULL}; 

    UnsignedBigInteger() = default;
    UnsignedBigInteger(const unsigned long long value) {
        //printf("value = %llu\n", value);
        unsigned long long shift{two_fifty_five};
        for (size_t i=0; i!=len; ++i) {
            _value[i] = (value & shift) >> (8 * i);
            //printf("_value[%lu] in binary: %hhb - bitmask used here: %llb\n", i, _value[i], shift);
            shift <<= 8;
        };
        // above is equivalent to:
        // _value[0] = value & 0xff;
        // _value[1] = value & (0xff << 8);
        // _value[2] = value & (0xff << 16); 
        // ... _value[6] = value & (0xff << 56);
    }

    operator unsigned long long int() const {
        unsigned long long int result{};
        for (size_t i=0; i!=len; ++i) {
            result += static_cast<unsigned long long>(_value[i]) << (8*i);
            //printf("get result[%lu] = %llu, or in binary %llb\n", i, result, result);
        }

        return result;
    }

    UnsignedBigInteger operator-(const UnsignedBigInteger& addend) const {

        UnsignedBigInteger result{};
        
        for (size_t i=0; i!=len; ++i) {
            result._value[i] = (_value[i] - addend._value[i]) % two_fifty_six;

            // if subtraction in line above yields a negative value, switch empty buckets to 1s:
            if (*this < addend && result._value[i] == 0) 
                result._value[i] = two_fifty_five;

            //printf("subtraction result, buckets: _value[%lu]: %hhb\n", i, result._value[i]);
        }

        return result; 
    }

private:
    constexpr static size_t len = 7;
    uint8_t _value[len];
};


int main() {
    // verify you are in 64-bit environment
    constexpr static unsigned long long max_ull{std::numeric_limits<unsigned long long int>::max()}; 
    assert(max_ull == 18'446'744'073'709'551'615ULL);  // = 2^64 - 1 = billions of billions
    printf("ULLONG_MAX is set for 64bit arithmetics: %llu -\nbinary: %llb\n\n", max_ull, max_ull);

    printf("Define two UnsignedBigInteger objects:\n");
    UnsignedBigInteger a{13ULL}, b{27ULL};
    //UnsignedBigInteger a{0ULL}, b{1'000'000ULL};
    std::cout << "a = " << a << "\n" << "b = " << b << std::endl;
    
    printf("\nSubtract them\n(use modular arithmetic if a < b):\n");
    UnsignedBigInteger a_minus_b = a - b;
    std::cout << "a_minus_b = " << a << " - " << b << " = " << a_minus_b << std::endl;
    std::cout << "In binary notation:\n";
    std::cout << "a_minus_b = " << std::format("{:b}", static_cast<unsigned long long>(a)) << " - " << std::format("{:b}", static_cast<unsigned long long>(b)) << " = " << std::format("{:b}", static_cast<unsigned long long>(a_minus_b)) << std::endl;
}

This yields:

ULLONG_MAX is set for 64bit arithmetics: 18446744073709551615 -
binary: 1111111111111111111111111111111111111111111111111111111111111111

Define two UnsignedBigInteger objects:
a = 13
b = 27

Subtract them
(use modular arithmetic if a < b):
a_minus_b = 13 - 27 = 72057594037927922
In binary notation:
a_minus_b = 1101 - 11011 = 11111111111111111111111111111111111111111111111111110010

which is as intended.

edited Dec 31, 2024 at 0:16

answered Dec 30, 2024 at 15:25

Giogre

1,6649 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Ben Voigt Dec 30, 2024 at 15:31

This is wrong because you ignored "The shift operators << and >> group left-to-right. The operands shall be prvalues of integral or unscoped enumeration type and integral promotions are performed." eel.is/c++draft/expr.shift#1

Giogre Dec 30, 2024 at 15:34

@BenVoigt wrong because I am referring to narrow casts instead of implicit integral promotions?

Ben Voigt Dec 30, 2024 at 15:37

Wrong because "overflow for a uint8_t" doesn't matter. The shift is performed on int and the result is added (shouldn't this be bitwise OR?) into the appropriately-named result variable, which is even wider. There's nothing being stored to a uint8_t here, so it doesn't matter that it doesn't fit in uint8_t.

Giogre Dec 30, 2024 at 15:53

@BenVoigt Regarding the bitshift inside the conversion operator, result is not wider, as both it and static_cast<>(_value[]) are of the type largest unsigned int. Bitwise OR is not needed as I am adding up to result the internal buckets progressively shifted by 8bits at a time, I am not adding two things bitwise

Ben Voigt Jan 2 at 14:51

Oh I think I figured out the confusion. I'm using "lhs" and "rhs" as short-hand for "left operand of the << shift operator" and "right operand of the shift operator", while I think you're referring to "left operand of the += operator" and "right operand of the += operator". Yes, all this is happening inside the right-hand of +=, but it doesn't happen because the right-hand of += is int. Cause and effect are reversed. The cause of all of this is that the left-hand operand of << is promoted to int. The overflow happens long before += gets involved.

|

Collectives™ on Stack Overflow

Wrong output from custom big unsigned integer type when checking whether subtraction follow modular arithmetic

1 Answer 1

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related