Nevermind, my last message would only ever allow me to do self inner joins.

On Thu, Jan 12, 2017 at 11:55 PM, Cody Goodman <codygman.consulting@gmail.com> wrote:
I figured out a way to kind of sideskirt the problem where I define my main join as a helper and then dispatch on length in the calling function. GHC generated this type for it:


innerJoin
  :: ((as ++ bs) ~ (bs ++ as), PrimMonad m, Ord k,
      RecVec (bs ++ as)) =>
     Producer (Rec Identity bs) m ()
     -> Getting k (Rec Identity bs) k
     -> Producer (Rec Identity as) m ()
     -> Getting k (Rec Identity as) k
     -> m (FrameRec (bs ++ as))


What exactly does (as ++ bs) ~ (bs ++ as) mean?


On Thu, Jan 12, 2017 at 11:48 PM, Cody Goodman <codygman.consulting@gmail.com> wrote:
I know my question doesn't quite make sense and you can't have "multiple rigid type variables" but I couldn't think of a better way to characterize it or explain it. So I'll also try to give more detail below along with the code example and error at the end.

I have an inner join function using vinyl which does cool stuff like giving a type error if you try to join on a column that doesn't exist. I previously had an implementation that blindly built an index based on the left argument.

I then tried to dispatch based on the lengths of the left and right producer arguments. The problem is my return type is:

Rec f (as ++ bs)

That means I have two different producers, one of as and one of bs. I then do a case statement on the lengths and I want to store the producer to use for a given piece in that variable. The problem is ghc "pins" the type variable as either (Rec f as) or (Rec f bs) when I need to make that choice at runtime.


My code and error is both at github here: https://github.com/codygman/frames-diff/blob/inner-join-fix/Frames/Diff.hs#L113

and below:

-- | Performs an inner join and keeps any duplicate column
-- Recommend keeping columns in producers disjoint because accessing
-- anything but the leftmost duplicate column could prove difficult.
-- see: https://github.com/VinylRecords/Vinyl/issues/55#issuecomment-269891633
innerJoin :: (MonadIO m, Ord k) =>
Producer (Rec f leftRows) IO () -- leftProducer
-> Getting k (Rec f leftRows) k -- leftProducer lens
-> Producer (Rec f rightRows) IO () -- rightProducer
-> Getting k (Rec f rightRows) k -- rightProducer lens
-> m (P.Proxy P.X () () (Rec f (leftRows ++ rightRows)) IO ())
innerJoin leftProducer leftLens rightProducer rightLens = do
leftProducerLen <- P.liftIO $ P.length leftProducer
rightProducerLen <- P.liftIO $ P.length rightProducer
let curProducer = case rightProducerLen < leftProducerLen of
True -> rightProducer
-- False -> leftProducer
let curKeymapProducer = case rightProducerLen < leftProducerLen of
True -> leftProducer
-- False -> rightProducer
let curLensLookup = case rightProducerLen < leftProducerLen of
True -> rightLens
-- False -> leftLens
let curLensInsert = case rightProducerLen < leftProducerLen of
True -> leftLens
-- False -> rightLens
let appender km row = case rightProducerLen < leftProducerLen of
True -> case M.lookup (view curLensLookup row) km of
Just otherRow -> pure $ rappend otherRow row
Nothing -> P.mzero
-- False -> case M.lookup (view curLensLookup row) km of
-- Just otherRow -> pure $ rappend row otherRow
-- Nothing -> P.mzero
keyMap <- P.liftIO $ P.fold (\m r -> M.insert (view curLensInsert r) r m) M.empty id curKeymapProducer
pure $ curProducer >-> P.mapM (\r -> appender keyMap r)
-- error if I uncomment my false cases (specifically the False case for curProducer)
-- [5 of 5] Compiling Frames.Diff ( Frames/Diff.hs, interpreted )
-- Frames/Diff.hs:125:32: error:
-- • Couldn't match type ‘leftRows’ with ‘rightRows’
-- ‘leftRows’ is a rigid type variable bound by
-- the type signature for:
-- innerJoin :: forall (m :: * -> *) k (f :: *
-- -> *) (leftRows :: [*]) (rightRows :: [*]).
-- (MonadIO m, Ord k) =>
-- Producer (Rec f leftRows) IO ()
-- -> Getting k (Rec f leftRows) k
-- -> Producer (Rec f rightRows) IO ()
-- -> Getting k (Rec f rightRows) k
-- -> m (P.Proxy P.X () () (Rec f (leftRows ++ rightRows)) IO ())
-- at Frames/Diff.hs:113:14
-- ‘rightRows’ is a rigid type variable bound by
-- the type signature for:
-- innerJoin :: forall (m :: * -> *) k (f :: *
-- -> *) (leftRows :: [*]) (rightRows :: [*]).
-- (MonadIO m, Ord k) =>
-- Producer (Rec f leftRows) IO ()
-- -> Getting k (Rec f leftRows) k
-- -> Producer (Rec f rightRows) IO ()
-- -> Getting k (Rec f rightRows) k
-- -> m (P.Proxy P.X () () (Rec f (leftRows ++ rightRows)) IO ())
-- at Frames/Diff.hs:113:14
-- Expected type: Producer (Rec f rightRows) IO ()
-- Actual type: Producer (Rec f leftRows) IO ()
-- • In the expression: leftProducer
-- In a case alternative: False -> leftProducer
-- In the expression:
-- case rightProducerLen < leftProducerLen of {
-- True -> rightProducer
-- False -> leftProducer }
-- • Relevant bindings include
-- curProducer :: Producer (Rec f rightRows) IO ()
-- (bound at Frames/Diff.hs:123:7)
-- rightLens :: Getting k (Rec f rightRows) k
-- (bound at Frames/Diff.hs:119:47)
-- rightProducer :: Producer (Rec f rightRows) IO ()
-- (bound at Frames/Diff.hs:119:33)
-- leftLens :: Getting k (Rec f leftRows) k
-- (bound at Frames/Diff.hs:119:24)
-- leftProducer :: Producer (Rec f leftRows) IO ()
-- (bound at Frames/Diff.hs:119:11)
-- innerJoin :: Producer (Rec f leftRows) IO ()
-- -> Getting k (Rec f leftRows) k
-- -> Producer (Rec f rightRows) IO ()
-- -> Getting k (Rec f rightRows) k
-- -> m (P.Proxy P.X () () (Rec f (leftRows ++ rightRows)) IO ())
-- (bound at Frames/Diff.hs:119:1)