log time instead of linear for case matching

We have a couple use cases in Yesod that can potentially match many different patterns. Routing connects the url of an http request to a Haskell function. The current code just uses a pattern match, which scales linearly. So if a site has a hundred different routes (urls), it could take 100 comparisons to finally match the url to a function. Michael Snoyman is writing some code to make this issue obsolete. One of the things it does is use a Map lookup instead of a case match. More worrying is our system for internationalization of a website. A user is supposed to make a sum type with every custom message as a constructor in that sum type. data Msg = Model | Year | Please -- Rendering function for English. renderEnglish Model = "Model" renderEnglish Year = "Year" renderEnglish Please = "Please fill in your car's details" -- Rendering function for Swedish. renderSwedish Model = "Modell" renderSwedish Year = "Årgång" renderSwedish Please = "Vänligen fyll i uppgifterna för din bil" So if there are 100 custom messages on a site, that will setup a case match with potentially 100 comparisons. Note that we are using this technique for type safety- switching to a map here would be difficult. We want to know at compile time that a translation exists for every message. So first of all I am wondering if a sum type comparison does in fact scale linearly or if there are optimizations in place to make the lookup constant or logarithmic. Second, I as wondering (for the routing case) if Haskell can make a string case comparison logarithmic so that users can use case comparisons instead of maps for string collections that are known at compile time and won't change.

Excerpts from Greg Weber's message of Sun Oct 09 12:39:03 -0400 2011:
So first of all I am wondering if a sum type comparison does in fact scale linearly or if there are optimizations in place to make the lookup constant or logarithmic. Second, I as wondering (for the routing case) if Haskell can make a string case comparison logarithmic so that users can use case comparisons instead of maps for string collections that are known at compile time and won't change.
GHC will compile case-matches over large sum types into jump tables, so the lookup becomes constant. I don't think we have any cleverness for strings. Edward

Greg In GHC, big cases are done as tables (if dense) or trees (if sparse). If you have some examples where things go bad, do submit a bug report. For big dispatches on strings, I'm pretty sure we do something linear, top to bottom. I'd be strongly inclined to use a proper Map structure if you want good performance on that. Is there some reason you don't want to? Simon From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Greg Weber Sent: 09 October 2011 17:39 To: GHC users Subject: log time instead of linear for case matching We have a couple use cases in Yesod that can potentially match many different patterns. Routing connects the url of an http request to a Haskell function. The current code just uses a pattern match, which scales linearly. So if a site has a hundred different routes (urls), it could take 100 comparisons to finally match the url to a function. Michael Snoyman is writing some code to make this issue obsolete. One of the things it does is use a Map lookup instead of a case match. More worrying is our system for internationalization of a website. A user is supposed to make a sum type with every custom message as a constructor in that sum type. data Msg = Model | Year | Please -- Rendering function for English. renderEnglish Model = "Model" renderEnglish Year = "Year" renderEnglish Please = "Please fill in your car's details" -- Rendering function for Swedish. renderSwedish Model = "Modell" renderSwedish Year = "Årgång" renderSwedish Please = "Vänligen fyll i uppgifterna för din bil" So if there are 100 custom messages on a site, that will setup a case match with potentially 100 comparisons. Note that we are using this technique for type safety- switching to a map here would be difficult. We want to know at compile time that a translation exists for every message. So first of all I am wondering if a sum type comparison does in fact scale linearly or if there are optimizations in place to make the lookup constant or logarithmic. Second, I as wondering (for the routing case) if Haskell can make a string case comparison logarithmic so that users can use case comparisons instead of maps for string collections that are known at compile time and won't change.

I thought it could be a nice GHC feature to optimize string lookup, and that
using case arrows could be a nice syntax for creating maps.
We will be fine using a Map. Making sure that sum types are optimized was
the important thing, thanks!
On Mon, Oct 10, 2011 at 1:14 AM, Simon Peyton-Jones
Greg****
** **
In GHC, big cases are done as tables (if dense) or trees (if sparse). If you have some examples where things go bad, do submit a bug report.****
** **
For big dispatches on strings, I’m pretty sure we do something linear, top to bottom. I’d be strongly inclined to use a proper Map structure if you want good performance on that. Is there some reason you don’t want to?
Simon****
** **
*From:* glasgow-haskell-users-bounces@haskell.org [mailto: glasgow-haskell-users-bounces@haskell.org] *On Behalf Of *Greg Weber *Sent:* 09 October 2011 17:39 *To:* GHC users *Subject:* log time instead of linear for case matching****
** **
We have a couple use cases in Yesod that can potentially match many different patterns. Routing connects the url of an http request to a Haskell function. The current code just uses a pattern match, which scales linearly. So if a site has a hundred different routes (urls), it could take 100 comparisons to finally match the url to a function. Michael Snoyman is writing some code to make this issue obsolete. One of the things it does is use a Map lookup instead of a case match.****
** **
More worrying is our system for internationalization of a website. A user is supposed to make a sum type with every custom message as a constructor in that sum type.****
** **
data Msg = Model****
| Year****
| Please****
** **
-- Rendering function for English.****
renderEnglish Model = "Model"****
renderEnglish Year = "Year"****
renderEnglish Please = "Please fill in your car's details"****
** **
-- Rendering function for Swedish.****
renderSwedish Model = "Modell"****
renderSwedish Year = "Årgång"****
renderSwedish Please = "Vänligen fyll i uppgifterna för din bil"****
** **
So if there are 100 custom messages on a site, that will setup a case match with potentially 100 comparisons.****
** **
Note that we are using this technique for type safety- switching to a map here would be difficult. We want to know at compile time that a translation exists for every message.****
** **
So first of all I am wondering if a sum type comparison does in fact scale linearly or if there are optimizations in place to make the lookup constant or logarithmic. Second, I as wondering (for the routing case) if Haskell can make a string case comparison logarithmic so that users can use case comparisons instead of maps for string collections that are known at compile time and won't change.****
participants (3)
-
Edward Z. Yang
-
Greg Weber
-
Simon Peyton-Jones