
Hello list. I've been trying to figure a nice method to provide localisation. An application is deployed using a conventional installer. The end-user is not required to have the Haskell runtimes, compiler or platform. The application should bundle ready to use translation data. What I am after is simple; an intuitive way that an interested translator, with little knowledge of Haskell, can look at and create valid translation data. This is what I've been looking at lately. The first thing I noticed was the GNU gettext implementation for Haskell. The wiki page [1] has a nice explanation by Aufheben. The hgettext package is found here [2]. I don't know if this is a bad habit, but I had already separated the dialogue text in the code with variables holding the respective strings. At this time, I thought there could be some other way than gettext. Then I figured how to import localisation data, that the program loads, from external files. The data type is basically a tuple with variable-names associated with strings. This is bit like the file-embed package [3]. Still uncomfortable with i18n, I learned about the article "I18N in Haskell" in yesod blog [4]. I'd like to hear more about it. What is considered the best practice for localisation? -- [1] http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs [2] http://hackage.haskell.org/packages/archive/hgettext/ [3] http://hackage.haskell.org/package/file-embed [4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell

Paulo Pocinho
I don't know if this is a bad habit, but I had already separated the dialogue text in the code with variables holding the respective strings. At this time, I thought there could be some other way than gettext. Then I figured how to import localisation data, that the program loads, from external files. The data type is basically a tuple with variable-names associated with strings. This is bit like the file-embed package [3].
Still uncomfortable with i18n, I learned about the article "I18N in Haskell" in yesod blog [4]. I'd like to hear more about it.
What is considered the best practice for localisation?
I can't help you with best practice for Haskell, and I don't think there is any. Gettext is probably the easiest approach, because it integrates nicely with the rest of the environment. It automatically uses the usual LANG and LC_* variables, which are used in Unix-like systems. An even simpler (but not necessarily easier) approach is to hard-code the languages in a Map and just look up the string you need. In this case you have to code the integration yourself. It somewhat sounds like you are targetting the Windows platform anyway. Personally I'd likely prefer Gettext for its integration and all the existing translation tools. In either case, the best practice is not to work with variables, but with a default language. You write your text strings in your default language (usually English), but wrap them in a certain function call. The function will try to look up a translated message for the current language. This makes both programming and translating easier. This is how I imagine it works (or should work): main :: IO () main = do tr <- getTranslator putStrLn (tr "This is a test.") The 'tr' function is called just '_' in other languages, but you can't use the underscore in Haskell. A translater (person) would use a program to search your entire source code for those translatable strings, then they would use a translation program, which shows an English string and asks them to enter the translated string over and over, until all strings are translated. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On Thu, Sep 29, 2011 at 7:54 PM, Paulo Pocinho
Still uncomfortable with i18n, I learned about the article "I18N in Haskell" in yesod blog [4]. I'd like to hear more about it.
Yesod's approach is pretty nice [1]. The idea is to have a data type with all your messages, like data Message = Hello | WhatsYourName | MyNameIs String | Ihave_apples Int GoodBye For each of your supported languages, you provide a rendering function (they may be in separate source files) render_en_US :: Message -> String render_en_US Hello = "Hello!" render_en_US WhatsYourName = "What's your name?" render_en_US (MyNameIs name) = "My name is " ++ name ++ "." render_en_US (Ihave_apples 0) = "I don't have any apples." render_en_US (Ihave_apples 1) = "I have one apple." render_en_US (Ihave_apples n) = "I have " ++ n ++ " apples." render_en_US GoodBye = "Good bye!" render_pt_BR :: Message -> String render_pt_BR Hello = "Olá!" render_pt_BR WhatsYourName = "Como você se chama?" render_pt_BR (MyNameIs name) = "Eu me chamo " ++ name ++ "." render_pt_BR (Ihave_apples 0) = "Não tenho nenhuma maçã." render_pt_BR (Ihave_apples 1) = "Tenho uma maçã." render_pt_BR (Ihave_apples 2) = "Tenho uma maçã." render_pt_BR (Ihave_apples n) = "Tenho " ++ show n ++ " maçãs." render_pt_BR GoodBye = "Tchau!" Given those functions, you can construct something like type Lang = String render :: [Lang] -> Message -> String render ("pt" :_) = render_pt_BR render ("pt_BR":_) = render_pt_BR render ("en" :_) = render_en_US render ("en_US":_) = render_en_US render (_:xs) = render xs render _ = render_en_US So 'r = render ["fr", "pt"]' will do the right thing. You just need to pass this 'r' around in your code. Using is easy and clear: putStrLn $ r Hello putStrLn $ r WhatsYourName name <- getLine putStrLn $ r MyNameIs "Alice" putStrLn $ r (Ihave_apples $ length name `mod` 4) putStrLn $ r GoodBye This approach is nice for several reasons: - Builtin support for complicated messages. Making something like Ihave_apples in gettext would be hard. Each language has its own rules, and you need to encode all of them in your code. On this example, my render_pt_BR recognizes and treats differently the 2 apples case. If you didn't think about it when you wrote your code (using gettext), you'd need to change your code for pt_BR. - Fast processing. "render" as I've coded above looks at the language list just once. After that, it's just GHC's pattern matching. - Fast startup. No need to look for strings on the hard drive. - Flexible. You may try several extensions, depending on your needs (a) Using a type class (like Yesod) if you don't want one big data type. (b) Using Text instead of String. Or even Builder. The biggest drawback is lack of tool support and lack of "translators' expertise". gettext has a lot of inertia and is used everywhere on a FLOSS system. But as Ertugrul Soeylemez said, if you're targeting Windows, _not_ using gettext should be an advantage (less pain while create installers). HTH, [1] http://hackage.haskell.org/packages/archive/yesod-core/0.9.2/doc/html/Yesod-... -- Felipe.

On Thu, Sep 29, 2011 at 3:54 PM, Paulo Pocinho
Hello list.
I've been trying to figure a nice method to provide localisation. An
The grammatical framework excels at translation and localization -- it probably has the highest learning curve of the options; but it will generate the best / most accurate text depending on the target language: * http://www.grammaticalframework.org At first brush, it may seem like extreme overkill; but it is able to handle many, many infuriating corner cases (eg: properly forming discontinuous constituents, updating case / tense and number to agree with potentially variable quantities and genders, addressing the absence of "yes" and "no" in some languages, etc...) The language processing bits are expressed in a PMCFG grammar, which uses a syntax similar to haskell. The PMCFG compiles to a PGF file that can be loaded and used by a haskell module that implements the runtime, so it doesn't change your run-time requirements (if you already rely on haskell, there are also runtime implementations in javascript, java, c and python). --Rogan
application is deployed using a conventional installer. The end-user is not required to have the Haskell runtimes, compiler or platform. The application should bundle ready to use translation data. What I am after is simple; an intuitive way that an interested translator, with little knowledge of Haskell, can look at and create valid translation data.
This is what I've been looking at lately. The first thing I noticed was the GNU gettext implementation for Haskell. The wiki page [1] has a nice explanation by Aufheben. The hgettext package is found here [2].
I don't know if this is a bad habit, but I had already separated the dialogue text in the code with variables holding the respective strings. At this time, I thought there could be some other way than gettext. Then I figured how to import localisation data, that the program loads, from external files. The data type is basically a tuple with variable-names associated with strings. This is bit like the file-embed package [3].
Still uncomfortable with i18n, I learned about the article "I18N in Haskell" in yesod blog [4]. I'd like to hear more about it.
What is considered the best practice for localisation?
-- [1] http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs [2] http://hackage.haskell.org/packages/archive/hgettext/ [3] http://hackage.haskell.org/package/file-embed [4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Sep 30, 2011 at 5:44 PM, Rogan Creswick
The grammatical framework excels at translation and localization -- it probably has the highest learning curve of the options; but it will generate the best / most accurate text depending on the target language:
* http://www.grammaticalframework.org
At first brush, it may seem like extreme overkill; but it is able to handle many, many infuriating corner cases (eg: properly forming discontinuous constituents, updating case / tense and number to agree with potentially variable quantities and genders, addressing the absence of "yes" and "no" in some languages, etc...)
The language processing bits are expressed in a PMCFG grammar, which uses a syntax similar to haskell. The PMCFG compiles to a PGF file that can be loaded and used by a haskell module that implements the runtime, so it doesn't change your run-time requirements (if you already rely on haskell, there are also runtime implementations in javascript, java, c and python).
I've seen GF before, but I can't actually see how one would use it for localization. Are there any simple examples? Cheers, =) -- Felipe.

On Fri, Sep 30, 2011 at 2:09 PM, Felipe Almeida Lessa
On Fri, Sep 30, 2011 at 5:44 PM, Rogan Creswick
wrote: The grammatical framework excels at translation and localization -- it probably has the highest learning curve of the options; but it will generate the best / most accurate text depending on the target language:
* http://www.grammaticalframework.org
At first brush, it may seem like extreme overkill; but it is able to handle many, many infuriating corner cases (eg: properly forming discontinuous constituents, updating case / tense and number to agree with potentially variable quantities and genders, addressing the absence of "yes" and "no" in some languages, etc...)
The language processing bits are expressed in a PMCFG grammar, which uses a syntax similar to haskell. The PMCFG compiles to a PGF file that can be loaded and used by a haskell module that implements the runtime, so it doesn't change your run-time requirements (if you already rely on haskell, there are also runtime implementations in javascript, java, c and python).
I've seen GF before, but I can't actually see how one would use it for localization. Are there any simple examples?
Here's a *very* simple example I just threw together, based on the Foods grammar (so it's quite contrived), but hopefully it's sufficient for the moment: https://github.com/creswick/gfI8N Updating it to use the Phrasebook example would make it much more interesting... I think there are numbers in there, and iirc, it uses the actual resource grammars, which is what you really want for a real system. Usage details in the README.md, and I've commented the important function in the haskell source. The rest of the magic is in the (also ugly) Setup.hs. You will also need to manually install gf, I believe, even if you use cabal-dev, due to some annoyingly complex (but solveable) build-order and PATH complications. --Rogan

Thanks for all the great information provided in this thread. The wiki page that Paulo originally linked had Vasyl's fantastic documentation for using his hgettext package, but it did not mention any of the other methods we discussed. I moved the gettext documentation to its own linked page and tried to collect together the general information from this thread. Please take a moment and look it over. Correct any mistakes I made. http://haskell.org/haskellwiki/Internationalization_of_Haskell_programs Rogan, especially, please look it over. I really had to read between the lines to come up with a clear and concise description of GF and what it does, so I may have gotten it wrong. Felipe, I put your wonderful example on its own linked page. Thanks, Yitz

On Thu, Sep 29, 2011 at 6:54 PM, Paulo Pocinho
Hello list.
I've been trying to figure a nice method to provide localisation. An application is deployed using a conventional installer. The end-user is not required to have the Haskell runtimes, compiler or platform. The application should bundle ready to use translation data. What I am after is simple; an intuitive way that an interested translator, with little knowledge of Haskell, can look at and create valid translation data.
I've been meaning to bundle up some i18n/l10n code that I have lying around from previous compiler projects. What I was using was a gettext/printf template haskell function that can be hunted for with xgettext, which expands to code that reads translated .po files for the current module at two different times. Once at compile time to check that any printf-style format strings are compatible across each translation, and again later at runtime to allow for additional translations to be added. The biggest headache I have is that doing all this requires a pretty hairy .cabal file, and I haven't yet figured out how to package that up nicely for use in libraries. I'll admit I have only ever really tested this with a joke en@lolcattranslation, which I auto-translate with perl, though I admit if I could find a nice perl module for generating zalgo-style text, en@zalgo would be pretty neat to auto-generate as well. I'm not sure its considered "best practice", since I haven't bundled it up for third party use yet, but its *my* practice. ;) -Edward Kmett
This is what I've been looking at lately. The first thing I noticed was the GNU gettext implementation for Haskell. The wiki page [1] has a nice explanation by Aufheben. The hgettext package is found here [2].
I don't know if this is a bad habit, but I had already separated the dialogue text in the code with variables holding the respective strings. At this time, I thought there could be some other way than gettext. Then I figured how to import localisation data, that the program loads, from external files. The data type is basically a tuple with variable-names associated with strings. This is bit like the file-embed package [3].
Still uncomfortable with i18n, I learned about the article "I18N in Haskell" in yesod blog [4]. I'd like to hear more about it.
What is considered the best practice for localisation?
-- [1] http://www.haskell.org/haskellwiki/Internationalization_of_Haskell_programs [2] http://hackage.haskell.org/packages/archive/hgettext/ [3] http://hackage.haskell.org/package/file-embed [4] http://www.yesodweb.com/blog/2011/01/i18n-in-haskell
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Oct 11, 2011 at 5:03 PM, Edward Kmett
I'll admit I have only ever really tested this with a joke en@lolcat translation, which I auto-translate with perl, though I admit if I could find a nice perl module for generating zalgo-style text, en@zalgo would be pretty neat to auto-generate as well.
Using Yesod's approach and assuming lolspeak :: String -> String you could have render_en_lolcat = lolspeak . render_en_US Pretty neat! ;-D Cheers, -- Felipe.
participants (6)
-
Edward Kmett
-
Ertugrul Soeylemez
-
Felipe Almeida Lessa
-
Paulo Pocinho
-
Rogan Creswick
-
Yitzchak Gale