
Hi everybody, I'm working on a module that encodes "static" facts about "the real world". For now, I'm working on an ISO 3166 compliant list of countries, country names, and country codes. I've run into a bit of an optimization issue. There is a static bijective correspondence between countries and their codes. In order to keep one just one "large" data structure representation as Haskell code, I encoded this bijection using a list. I'm looking to write queries against this list, but it is rather tedious. I figured I could make some Data.Maps to handle it for me. -- Country and ISOCountryCodes derive (Data, Eq, Ord, Show, Typeable) countries_and_iso_country_codes :: [ (Country, ISOCountryCode) ] countries_and_iso_country_codes = [ ( Afghanistan , ISOCountryCode AF AFG (isoNumericCode 004) ) , ( AlandIslands , ISOCountryCode AX ALA (isoNumericCode 248) ) , ( Albania , ISOCountryCode AL ALB (isoNumericCode 008) ) ... , ( Zimbabwe , ISOCountryCode ZW ZWE (isoNumericCode 716) ) ] map_country_to_country_code :: Map Country ISOCountryCode map_country_to_country_code = fromList countries_and_iso_country_codes map_country_code_to_country :: Map ISOCountryCode Country map_country_code_to_country = fromList . fmap (\(a,b) -> (b, a)) $ countries_and_iso_country_codes Is there anyway to instruct GHC (and maybe other compilers) to compute these maps statically? Are GHC and the other compilers smart enough to do it automatically? Although the list isn't huge, I would still rather get rid of the O(2*n) operation of turning it into maps at run- time. (Especially since some later list encodings like these might be enormous) What should I be looking into? Thanks

I would not worry about doing that at runtime.
The only reliable way to make sure it happens at compile time that I
can think of would be some Template Haskell.
(Or some really deep magic with dictionaries.)
-- Lennart
On Mon, Oct 11, 2010 at 3:51 AM, Alexander Solla
Hi everybody, I'm working on a module that encodes "static" facts about "the real world". For now, I'm working on an ISO 3166 compliant list of countries, country names, and country codes. I've run into a bit of an optimization issue. There is a static bijective correspondence between countries and their codes. In order to keep one just one "large" data structure representation as Haskell code, I encoded this bijection using a list. I'm looking to write queries against this list, but it is rather tedious. I figured I could make some Data.Maps to handle it for me.
-- Country and ISOCountryCodes derive (Data, Eq, Ord, Show, Typeable) countries_and_iso_country_codes :: [ (Country, ISOCountryCode) ] countries_and_iso_country_codes =
[ ( Afghanistan , ISOCountryCode AF AFG (isoNumericCode 004) ) , ( AlandIslands , ISOCountryCode AX ALA (isoNumericCode 248) ) , ( Albania , ISOCountryCode AL ALB (isoNumericCode 008) ) ... , ( Zimbabwe , ISOCountryCode ZW ZWE (isoNumericCode 716) ) ] map_country_to_country_code :: Map Country ISOCountryCode map_country_to_country_code = fromList countries_and_iso_country_codes map_country_code_to_country :: Map ISOCountryCode Country map_country_code_to_country = fromList . fmap (\(a,b) -> (b, a)) $ countries_and_iso_country_codes Is there anyway to instruct GHC (and maybe other compilers) to compute these maps statically? Are GHC and the other compilers smart enough to do it automatically? Although the list isn't huge, I would still rather get rid of the O(2*n) operation of turning it into maps at run-time. (Especially since some later list encodings like these might be enormous) What should I be looking into? Thanks _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Oct 10, 2010 at 10:51 PM, Alexander Solla
Is there anyway to instruct GHC (and maybe other compilers) to compute these maps statically? Are GHC and the other compilers smart enough to do it automatically? Although the list isn't huge, I would still rather get rid of the O(2*n) operation of turning it into maps at run-time. (Especially since some later list encodings like these might be enormous) What should I be looking into?
You may use Template Haskell and forget about the 'Data.Map's entirely =). I've attached a proof-of-concept code that turns list :: [(a,b)] list = [(x1,y1), (x2, y2), ..., (xn, yn)] into two functions (you can choose the names) fromAtoB :: a -> b fromAtoB x1 = y1 fromAtoB x2 = y2 ... fromAtoB xn = yn fromBtoA :: b -> a fromBtoA y1 = x1 ... fromBtoA yn = xn but with the arguments tranformed to patterns. Compiling a test module: $ ghc -ddump-splices -c ModuleWithFunctions.hs Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. Loading package array-0.3.0.1 ... linking ... done. Loading package containers-0.3.0.0 ... linking ... done. Loading package pretty-1.0.1.1 ... linking ... done. Loading package template-haskell ... linking ... done. ModuleWithFunctions.hs:1:0: ModuleWithFunctions.hs:1:0: Splicing declarations bimap "map_country_to_country_code" "map_country_code_to_country" countries_and_iso_country_codes ======> ModuleWithFunctions.hs:8:2-98 map_country_to_country_code Afghanistan = ISOCountryCode AF AFG (NC 4) map_country_to_country_code AlandIslands = ISOCountryCode AX ALA (NC 248) map_country_to_country_code Albania = ISOCountryCode AL ALB (NC 8) map_country_to_country_code Zimbabwe = ISOCountryCode ZW ZWE (NC 716) map_country_code_to_country ISOCountryCode AF AFG NC 4 = Afghanistan map_country_code_to_country ISOCountryCode AX ALA NC 248 = AlandIslands map_country_code_to_country ISOCountryCode AL ALB NC 8 = Albania map_country_code_to_country ISOCountryCode ZW ZWE NC 716 = Zimbabwe So two functions were spliced into your code with explicit pattern matches. I don't know how efficient is the code that GHC generates for functions with many patterns. Now, testing them: $ ghci GHCi, version 6.12.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. :Prelude> :l Module [4 of 4] Compiling Module ( Module.hs, interpreted ) Ok, modules loaded: Module, ModuleWithData, ModuleWithFunctions, Template. *Module> :bro Module data Country = Afghanistan | AlandIslands | Albania | Zimbabwe data ISOCountryCode = ISOCountryCode TwoCode ThreeCode NumCode data NumCode = NC Int data ThreeCode = AFG | ALA | ALB | ZWE data TwoCode = AF | AX | AL | ZW countries_and_iso_country_codes :: [(Country, ISOCountryCode)] isoNumericCode :: Int -> NumCode map_country_code_to_country :: ISOCountryCode -> Country map_country_to_country_code :: Country -> ISOCountryCode *Module> map_country_to_country_code Albania Loading package array-0.3.0.1 ... linking ... done. Loading package containers-0.3.0.0 ... linking ... done. Loading package pretty-1.0.1.1 ... linking ... done. Loading package template-haskell ... linking ... done. ISOCountryCode AL ALB (NC 8) *Module> map_country_code_to_country $ ISOCountryCode AL ALB (NC 8) Albania Cheers! =) -- Felipe.

On Sun, 10 Oct 2010 18:51:59 -0700, Alexander Solla
Although the list isn't huge, I would still rather get rid of the O(2*n) operation of turning it into maps at run-time.
I usually handle this as follows: 1) I create my data file in some human-friendly format (such as your list of tuples), so that I can easily edit it later, as required. 2) I write a program, a sort of preprocessor, that (a) loads the data from the human-friendly format into a processing-friendly structure, and then (b) serializes that structure into a file that's efficient to load at run time. (So, for example, lookups from the name of a country to its ISO codes can be handled via a Patricia structure; a Patricia structure is tedious to construct, but once constructed, is easy to serialize and de-serialize.) 3) The file containing the serialized structure is then linked into the rest of the "real" program as a source unit, where the serialized structure is represented as a constant (usually a string, but sometimes an array of numbers, etc.). If your build environment has a reasonable make-like tool, then the whole process is pretty automatic; you just modify the human-friendly file as often as you want, and the preprocessor is invoked automatically, as needed. -Steve Schafer
participants (4)
-
Alexander Solla
-
Felipe Lessa
-
Lennart Augustsson
-
Steve Schafer