
Hi all, First, a little context. I'm currently working on [1] a little OpenGL project - a Quake 3 map viewer in Haskell. For the most part, things are working out great, but the way I construct the scene is costing me a lot of alloction, and I wonder if I can do better. The OpenGL system is essentially a giant state machine. Whenever you issue a draw call ("draw these triangles"), OpenGL will do different things depending on what the state is. For example, if you have a texture bound, then the triangle will be textured, and if you have blending enabled, then the triangle might be blended into whatever was drawn previously. While this can be tricky to manage, Haskell does a good job of giving us tools to model state. The real spanner is that state transitions are not free. In the above example, the state transitions associated with binding a texture or enabling blending both cost time - and this time is variable depending on the state change. In this case, setting the currently bound texture is a very expensive operation, whereas setting the blending mode is less costly. For this reason, it's common in graphics programming to want to sort draw calls to minimise state changes. For example, we might draw all triangles with texture 1, then all triangles with texture 2 - rather than interleaving those draw calls. To model this in Haskell, I'm using a tree-like structure to group and express each state change [2]. To give you an example of what this looks like, I have: class RenderNode a m where draw :: a -> m () instance RenderNode (IO ()) IO where draw = id newtype BindTexture a = BindTexture (MonoidalIntMap a) [3] deriving (Functor, Monoid) instance (MonadIO m, RenderNode a m) => RenderNode (BindTexture a) m where draw (BindTexture t) = IM.foldrWithKey (\texture0 m next -> do glActiveTexture GL_TEXTURE0 glBindTexture GL_TEXTURE_2D (fromIntegral texture0) draw m next) (return ()) (getMonoidalIntMap t) newtype BlendMode a = BlendMode (MonoidalMap (GLuint, GLuint) a) [3] deriving (Functor, Monoid) blendMode :: (GLuint, GLuint) -> a -> BlendMode a blendMode srcDst = BlendMode . MonoidalMap . Map.singleton srcDst instance (MonadIO m, RenderNode a m) => RenderNode (BlendMode a) m where draw (BlendMode m) = Map.foldrWithKey (\(srcBlend, destBlend) child next -> do glBlendFunc srcBlend destBlend draw child next) (return ()) (getMonoidalMap m) I can then stack these nodes together to form a "render graph": newtype RenderGraph = RenderGraph (BindTexture (BlendMode (IO ()))) deriving (Monoid) The IO action at the leaf corresponds to a draw call to draw some triangles. Now that I've explained the basics of my rendering plan, I just need to tell you a little bit about Quake 3 maps. A Quake 3 map is essentially a big soup of "faces", where each face is a collection of triangles (that can be drawn in one call) and a set of associated state. Furthermore, each face is assigned a "cluster" (an Int), and we have a function that given a cluster gives us back a list of other clusters that we can "see" - this is visibility determination, or occlusion culling. I am hence modelling the map as an IntMap RenderGraph - each cluster maps to a RenderGraph of how to draw all the faces in that cluster, organised to minimise draw calls. Then, given some sort of visibility check, when I render I filter this IntMap to only the clusters that can be seen, and combine all the render graphs: IntMap.foldlWithKey' (\scene clusterNumber clusterFaces -> scene <> clusterFaces) mempty clusters In reality, this results in combining a few hundred-thousad very deeply nested structures, and naturally that's resulting in a fair bit of allocation pressure. I need to do this almost every frame, because as the camera moves, we might be able to see different clusters, so I don't think there's really much that I can cache. My actual final RenderGraph is Sort |> Cull |> MultiplePasses |> BindTexture |> SetUniform Bool |> SetDynamicUniform (M33 Float) |> AlphaFunc |> BlendMode |> DepthFunc where (|>) is Functor composition, and each of those functors is either a Map or IntMap. It seems to deteriorate to pairwise `mappend`, as you can see from the definition of mappend for MonoidalMap or MonoidalIntMap in [3]. Can anyone think of any obvious ways to rewrite this to have less allocation pressure? I'm of course open to other ways to batch operations, but there is something I find quite beautiful about shoving everything into Maps and mappending them. In the C world, the approach is usually to combine all the states into a single bitmask, and then sort that. That same approach might be possible here, but I'd like to keep the pick-and-choose compositional nature that I've got above. Thanks for reading this far, I hope my post has made sense. If you need clarification on anything, the source is below, and don't hesitate to ask me any questions! -- ocharles [1]: http://github.com/ocharles/hs-quake-3 [2]: https://github.com/ocharles/hs-quake-3/blob/master/RenderGraph.hs [3]: newtype MonoidalMap k a = MonoidalMap { getMonoidalMap :: Map k a } instance Functor (MonoidalMap k) where fmap f (MonoidalMap a) = MonoidalMap (fmap f a) instance (Monoid a, Ord k) => Monoid (MonoidalMap k a) where mempty = MonoidalMap Map.empty mappend (MonoidalMap a) (MonoidalMap b) = MonoidalMap (Map.unionWith mappend a b) mconcat = coerce . Map.unionsWith mappend . (coerce :: [MonoidalMap k a] -> [Map k a]) newtype MonoidalIntMap a = MonoidalIntMap { getMonoidalIntMap :: IntMap a } deriving (Functor) instance Monoid a => Monoid (MonoidalIntMap a) where mempty = MonoidalIntMap IM.empty mappend (MonoidalIntMap a) (MonoidalIntMap b) = MonoidalIntMap (IM.unionWith mappend a b) mconcat = coerce . IM.unionsWith mappend . (coerce :: [MonoidalIntMap a] -> [IntMap a])