How to ensure optimization for large immutable vectors to be shared w.r.t. Referential Transparency

Hello Cafe and respected GHC Devs, I would like to ensure some immutable vectors (can be quite large) are always shared instead of copied, and I think that should be straight forward w.r.t. referential transparency we enjoy. In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy? Here's my minimum reproducible snippet: ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ``` Thanks with best regards, Compl

On Tue, 6 Apr 2021, YueCompl via Haskell-Cafe wrote:
In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy?
You compare the ForeignPtrs of the mutable vectors. What about comparing the ForeignPtrs of the original immutable vectors?

Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
On 2021-04-06, at 20:00, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl via Haskell-Cafe wrote:
In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy?
You compare the ForeignPtrs of the mutable vectors. What about comparing the ForeignPtrs of the original immutable vectors?

On Tue, 6 Apr 2021, YueCompl wrote:
Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False
What happens for [3,2,5]?
λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```

```hs λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
On 2021-04-06, at 20:34, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl wrote:
Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False
What happens for [3,2,5]?
λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```

On Tue, 6 Apr 2021, YueCompl wrote:
```hsλ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
Then I have no idea. Maybe ghc-heap-view/ghc-vis can reveal the mystery. https://github.com/nomeata/haskell-bytes-bobkonf2021

Appears it'd work as expected when the immutable vector is originally created from foreign ptr, I think it'll work for my cases. (Though it's still strangely unexpected for ad hoc immutable vectors unshared when wrapped.) ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Foreign.Marshal.Alloc λ> import Foreign.Storable λ> import Foreign.ForeignPtr λ> λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| data SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: Storable a => VS.Vector a -> VS.Vector a -> Bool λ| isSameVector !x !y = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| λ| isSame :: SomeVector -> SomeVector -> Bool λ| isSame (SomeVector !x) (SomeVector !y) = isSameVector x y λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSame (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSame v v True λ> λ> λ> (fp :: ForeignPtr Int) <- mallocBytes 256 >>= newForeignPtr_ λ> let !v = VS.unsafeFromForeignPtr fp 0 32 λ| λ> isSame (SomeVector v) (SomeVector v) True λ> ```
On 2021-04-06, at 20:50, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl wrote:
```hsλ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
Then I have no idea.
Maybe ghc-heap-view/ghc-vis can reveal the mystery.

On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared). If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared. {-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS newtype SomeVector = SomeVector (VS.Vector Int) isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v) Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site. -- Viktor.

On Tue, Apr 06, 2021 at 09:51:30AM -0400, Viktor Dukhovni wrote:
Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site.
The below variant makes the issue even more clear for me: {-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS newtype SomeVector = SomeVector (VS.Vector Int) isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp -- makev :: VS.Vector Int makev = VS.fromList [0..1023] main :: IO () main = let v = makev in print $ v `seq` isSameVector (SomeVector v) (SomeVector v) With `NoMonomorphismRestriction` it fails to compile: /tmp/vec.hs:22:17: error: • Ambiguous type variable ‘a0’ arising from a use of ‘v’ prevents the constraint ‘(VS.Storable a0)’ from being solved. Probable fix: use a type annotation to specify what ‘a0’ should be. These potential instances exist: instance VS.Storable () -- Defined in ‘Foreign.Storable’ instance VS.Storable Bool -- Defined in ‘Foreign.Storable’ instance VS.Storable Char -- Defined in ‘Foreign.Storable’ ...plus four others ...plus 13 instances involving out-of-scope types (use -fprint-potential-instances to see them all) • In the first argument of ‘seq’, namely ‘v’ In the second argument of ‘($)’, namely ‘v `seq` isSameVector (SomeVector v) (SomeVector v)’ In the expression: print $ v `seq` isSameVector (SomeVector v) (SomeVector v) | 22 | in print $ v `seq` isSameVector (SomeVector v) (SomeVector v) | ^ With the default `MonomorphismRestriction`, it compiles and reports that the vectors are shared. -- Viktor.

Thanks very much for the diagnostic and explanation! I was wrong in assuming the `in isSameVector (SomeVector v) (SomeVector v)` part is enough to have type of v in `let !v = VS.fromList [3,2,5]` inferred as monomorphic, totally unaware about "NoMonomorphismRestriction" before, I've learned it today :D
On 2021-04-06, at 21:51, Viktor Dukhovni
wrote: On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared).
If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared.
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS
newtype SomeVector = SomeVector (VS.Vector Int)
isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp
main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v)
Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site.
-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On a second thought, maybe GHCi's silence is a bad thing here? Maybe it should complain loudly as GHC does? ```hs λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector x) (SomeVector y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| λ| :} λ> λ> let (v :: VS.Vector Int) = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) True λ> λ> λ> :set -XMonomorphismRestriction λ> λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) True λ> λ> :set -XNoMonomorphismRestriction λ> λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> ``` Further more, my intuition about GHC's type inference here is proved wrong by it, right. But I still think that with a single piece of `let-in` construct, types are better to be inferred as specific as possible, then the result would not be affected by some extension's semantics modification. Here v's type can obviously be inferred to `VS.Vector Int` according to its usage in the `SomeVector` data constructor, I wonder why GHC is not doing this?
On 2021-04-06, at 22:19, YueCompl via ghc-devs
wrote: Thanks very much for the diagnostic and explanation!
I was wrong in assuming the `in isSameVector (SomeVector v) (SomeVector v)` part is enough to have type of v in `let !v = VS.fromList [3,2,5]` inferred as monomorphic, totally unaware about "NoMonomorphismRestriction" before, I've learned it today :D
On 2021-04-06, at 21:51, Viktor Dukhovni
wrote: On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared).
If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared.
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS
newtype SomeVector = SomeVector (VS.Vector Int)
isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp
main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v)
Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site.
-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (3)
-
Henning Thielemann
-
Viktor Dukhovni
-
YueCompl