How to ensure optimization for large immutable vectors to be shared w.r.t. Referential Transparency

Hello Cafe and respected GHC Devs, I would like to ensure some immutable vectors (can be quite large) are always shared instead of copied, and I think that should be straight forward w.r.t. referential transparency we enjoy. In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy? Here's my minimum reproducible snippet: ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ``` Thanks with best regards, Compl

On Tue, 6 Apr 2021, YueCompl via Haskell-Cafe wrote:
In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy?
You compare the ForeignPtrs of the mutable vectors. What about comparing the ForeignPtrs of the original immutable vectors?

Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
On 2021-04-06, at 20:00, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl via Haskell-Cafe wrote:
In an attempt to determine whether two immutable vectors can be treated as the same one to enable specific optimizations for that case, I tried to use ST to determine their respective backing foreign ptrs for comparison. But appears it can be copied when wrapped in a newtype, I wonder why it is the case, and how to avoid the copy?
You compare the ForeignPtrs of the mutable vectors. What about comparing the ForeignPtrs of the original immutable vectors?

On Tue, 6 Apr 2021, YueCompl wrote:
Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False
What happens for [3,2,5]?
λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```

```hs λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
On 2021-04-06, at 20:34, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl wrote:
Thanks, it's a good idea. Unfortunately the result remains, making me even more curious. ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| :} λ> λ> let !v = VS.fromList [5..200000] in isSameVector (SomeVector v) (SomeVector v) False
What happens for [3,2,5]?
λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```

On Tue, 6 Apr 2021, YueCompl wrote:
```hsλ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
Then I have no idea. Maybe ghc-heap-view/ghc-vis can reveal the mystery. https://github.com/nomeata/haskell-bytes-bobkonf2021

Appears it'd work as expected when the immutable vector is originally created from foreign ptr, I think it'll work for my cases. (Though it's still strangely unexpected for ad hoc immutable vectors unshared when wrapped.) ```hs λ> :set -XBangPatterns λ> λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import Foreign.Marshal.Alloc λ> import Foreign.Storable λ> import Foreign.ForeignPtr λ> λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| data SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: Storable a => VS.Vector a -> VS.Vector a -> Bool λ| isSameVector !x !y = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| λ| isSame :: SomeVector -> SomeVector -> Bool λ| isSame (SomeVector !x) (SomeVector !y) = isSameVector x y λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSame (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSame v v True λ> λ> λ> (fp :: ForeignPtr Int) <- mallocBytes 256 >>= newForeignPtr_ λ> let !v = VS.unsafeFromForeignPtr fp 0 32 λ| λ> isSame (SomeVector v) (SomeVector v) True λ> ```
On 2021-04-06, at 20:50, Henning Thielemann
wrote: On Tue, 6 Apr 2021, YueCompl wrote:
```hsλ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True λ> ```
Then I have no idea.
Maybe ghc-heap-view/ghc-vis can reveal the mystery.

On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared). If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared. {-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS newtype SomeVector = SomeVector (VS.Vector Int) isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v) Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site. -- Viktor.

Thanks very much for the diagnostic and explanation! I was wrong in assuming the `in isSameVector (SomeVector v) (SomeVector v)` part is enough to have type of v in `let !v = VS.fromList [3,2,5]` inferred as monomorphic, totally unaware about "NoMonomorphismRestriction" before, I've learned it today :D
On 2021-04-06, at 21:51, Viktor Dukhovni
wrote: On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared).
If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared.
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS
newtype SomeVector = SomeVector (VS.Vector Int)
isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp
main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v)
Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site.
-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On a second thought, maybe GHCi's silence is a bad thing here? Maybe it should complain loudly as GHC does? ```hs λ> :set -package vector package flags have changed, resetting and loading new packages... λ> λ> import Prelude λ> λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector x) (SomeVector y) = λ| x'offset == y'offset && x'fp == y'fp λ| where λ| (x'fp, x'offset, _x'len) = VS.unsafeToForeignPtr x λ| (y'fp, y'offset, _y'len) = VS.unsafeToForeignPtr y λ| λ| :} λ> λ> let (v :: VS.Vector Int) = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) True λ> λ> λ> :set -XMonomorphismRestriction λ> λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) True λ> λ> :set -XNoMonomorphismRestriction λ> λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> ``` Further more, my intuition about GHC's type inference here is proved wrong by it, right. But I still think that with a single piece of `let-in` construct, types are better to be inferred as specific as possible, then the result would not be affected by some extension's semantics modification. Here v's type can obviously be inferred to `VS.Vector Int` according to its usage in the `SomeVector` data constructor, I wonder why GHC is not doing this?
On 2021-04-06, at 22:19, YueCompl via ghc-devs
wrote: Thanks very much for the diagnostic and explanation!
I was wrong in assuming the `in isSameVector (SomeVector v) (SomeVector v)` part is enough to have type of v in `let !v = VS.fromList [3,2,5]` inferred as monomorphic, totally unaware about "NoMonomorphismRestriction" before, I've learned it today :D
On 2021-04-06, at 21:51, Viktor Dukhovni
wrote: On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:
λ> import Control.Monad.ST λ> import qualified Data.Vector.Storable as VS λ> λ> :{ λ| λ| newtype SomeVector = SomeVector (VS.Vector Int) λ| λ| isSameVector :: SomeVector -> SomeVector -> Bool λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do λ| mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x λ| my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y λ| _ <- VS.unsafeFreeze mx λ| _ <- VS.unsafeFreeze my λ| return $ x'offset == y'offset && x'fp == y'fp λ| λ| :} λ> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v) False λ> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v True
In GHCi, but not in compiled programs, by default the `NoMonomorphismRestriction` extension is enabled. If I compile your code with that restriction, I can reproduce your results (the values are not shared).
If I either skip the extension, or add an explicit type annotation to for the vector, then the values are shared.
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import qualified Data.Vector.Storable as VS
newtype SomeVector = SomeVector (VS.Vector Int)
isSameVector :: SomeVector -> SomeVector -> Bool isSameVector (SomeVector !x) (SomeVector !y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp
main :: IO () main = let !v = VS.fromList [0..1023] -- :: VS.Vector Int in print $ isSameVector (SomeVector v) (SomeVector v)
Since newtypes are always strict in their argument, I don't think the BangPattern does what you'd like it to do, it just makes "main" strict in v. As defined with `NoMonomorphismRestriction` v is a polymorphic function, and I guess it is specialised at the call site.
-- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Tue, Apr 06, 2021 at 10:58:20PM +0800, YueCompl via ghc-devs wrote:
On a second thought, maybe GHCi's silence is a bad thing here? Maybe it should complain loudly as GHC does?
No, GHCi is doing the expected thing. Because GHCi's REPL sees one line at a time, it is not generally possible for it to infer specific monomorphic types on the spot, so GHCi infers the polymorphic type. And as for the complaint, that was because I prepended: v `seq` ... in which the type of `v` is ambiguous when polymorphic. GHCi also complains if you try that.
λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v)
One thing I'm not sure about, that perhaps someone else can shed light on, is whether with optimisation one might expect the two (SomeVector v) values to be subject to CSE, given that they both invoke `v` at the same type. Is there a non-default optimisation flag that makes CSE more aggressive that would make that happen? -- Viktor.

On Tue, Apr 06, 2021 at 11:10:51AM -0400, Viktor Dukhovni wrote:
λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v)
One thing I'm not sure about, that perhaps someone else can shed light on, is whether with optimisation one might expect the two (SomeVector v) values to be subject to CSE, given that they both invoke `v` at the same type. Is there a non-default optimisation flag that makes CSE more aggressive that would make that happen?
On a hunch I tried suppressing the inlining of the definition of `v`, and CSE then kicked in... {-# LANGUAGE BangPatterns #-} {-# LANGUAGE NoMonomorphismRestriction #-} import Control.Monad.ST import Data.Coerce import qualified Data.Vector.Storable as VS newtype SomeVector = SomeVector (VS.Vector Int) isSameVector :: SomeVector -> SomeVector -> Bool isSameVector !(SomeVector x) !(SomeVector y) = runST $ do mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y _ <- VS.unsafeFreeze mx _ <- VS.unsafeFreeze my return $ x'offset == y'offset && x'fp == y'fp makev = VS.fromList [0..1023] {-# NOINLINE makev #-} main :: IO () main = let !v = makev in print $ isSameVector (SomeVector v) (SomeVector v) So it appears that inlining of `v` into (SomeVector v) is the proximate barrier to identifying the two (SomeVector v) terms. Is this expected? -- Viktor.
participants (3)
-
Henning Thielemann
-
Viktor Dukhovni
-
YueCompl