`getSizeofMutableByteArray#` may return a negative number

The following test is done with `ghc-9.6.6`. ``` {-# LANGUAGE UnboxedTuples #-} {-# LANGUAGE MagicHash #-} import GHC.Base (IO(..)) import GHC.Exts import Prelude hiding (length) data MutByteArray = MutByteArray (MutableByteArray# RealWorld) {-# INLINE new #-} new :: Int -> IO MutByteArray new (I# nbytes) = IO $ \s -> case newByteArray# nbytes s of (# s', mbarr# #) -> let c = MutByteArray mbarr# in (# s', c #) {-# INLINE length #-} length :: MutByteArray -> IO Int length (MutByteArray arr) = IO $ \s -> case getSizeofMutableByteArray# arr s of (# s1, i #) -> (# s1, I# i #) test1 :: IO () test1 = do val <- new (-7) len <- length val print len test2 :: IO () test2 = do val <- new (-8) len <- length val print len ``` `test1` succeeds and prints `-7` The test prints the length for all `>= -7` `test2` fails with `Out of memory` If `length <= -8`, the test fails with `Out of memory` This is an interesting quirk. I expected `newByteArray#` to fail if the size given to it is `< 0` and I never expected `getSizeofMutableByteArray#` would return a negative number. Best, Adithya

On Wed, Oct 30, 2024 at 06:02:40PM +0530, Adithya Kumar wrote:
I expected `newByteArray#` to fail if the size given to it is `< 0` and I never expected `getSizeofMutableByteArray#` would return a negative number.
The allocation size is rounded up to the next multiple of a word size (typically 8). So (-7) becomes 0, while (-8) remains (-8), which as an unsigned quantity is too large to allocate. The primop definition is: primop NewByteArrayOp_Char "newByteArray#" GenPrimOp Int# -> State# s -> (# State# s, MutableByteArray# s #) {Create a new mutable byte array of specified size (in bytes), in the specified state thread. The size of the memory underlying the array will be rounded up to the platform's word size.} with out_of_line = True has_side_effects = True The implementation is then: NewByteArrayOp_Char -> \case [(CmmLit (CmmInt n w))] | asUnsigned w n <= max_inl_alloc_size -> opIntoRegs $ \ [res] -> doNewByteArrayOp res (fromInteger n) _ -> PrimopCmmEmit_External Both (-7) and (-8) are too large as unsigned numbers to be handled inline, taking the second branch, and then the RTS primop is called: stg_newByteArrayzh ( W_ n ) { W_ words, payload_words; gcptr p; MAYBE_GC_N(stg_newByteArrayzh, n); payload_words = ROUNDUP_BYTES_TO_WDS(n); words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); if (p == NULL) { jump stg_raisezh(base_GHCziIOziException_heapOverflow_closure); } TICK_ALLOC_PRIM(SIZEOF_StgArrBytes,WDS(payload_words),0); SET_HDR(p, stg_ARR_WORDS_info, CCCS); StgArrBytes_bytes(p) = n; return (p); } This takes an *unsigned* argument `n`, rounding up (-7) to 0, while leaving (-8) alone as a very large positive value. The rest follows. Perhaps it would be reasonable to check that the `Int#` argument of `NewByteArrayOp_Char` is non-negative. The comments for `asUnsigned` notwithstanding: -- | Interpret the argument as an unsigned value, assuming the value -- is given in two-complement form in the given width. -- -- Example: @asUnsigned W64 (-1)@ is 18446744073709551615. -- -- This function is used to work around the fact that many array -- primops take Int# arguments, but we interpret them as unsigned -- quantities in the code gen. This means that we have to be careful -- every time we work on e.g. a CmmInt literal that corresponds to the -- array size, as it might contain a negative Integer value if the -- user passed a value larger than 2^(wORD_SIZE_IN_BITS-1) as the Int# -- literal. asUnsigned :: Width -> Integer -> Integer asUnsigned w n = n .&. (bit (widthInBits w) - 1) -- Viktor.
participants (2)
-
Adithya Kumar
-
Viktor Dukhovni