why spawn (and safeSpawn, etc) use encodeString?

Hi! I noticed that spawn mangled my unicode characters - instead of my unicode character the called program recieved garbage. Looking deeper I found out that spawn pre-processes the string with `encodeString` function. `encodeString` first converts [Char] into [Word8], and then converts each individual Word8 back into Char. Since unicode Char will be converted into multiple Word8, the resulting string would be quite different. Example: Prelude> import Codec.Binary.UTF8.String Prelude Codec.Binary.UTF8.String> encodeString "Ø" "\195\152" Prelude Codec.Binary.UTF8.String> putStrLn "Ø" Ø Prelude Codec.Binary.UTF8.String> putStrLn $ encodeString "Ø" ÃPrelude Codec.Binary.UTF8.String> Is there a reason why xmonad uses `encodeString` here? I implemented a copy of the function that doesn't use `encodeString`, seems to work okay: safeSpawnUnicode :: MonadIO m => FilePath -> [String] -> m () safeSpawnUnicode prog args = io $ void $ forkProcess $ do uninstallSignalHandlers _ <- createSession executeFile prog True args Nothing Best regards, Platon Pronko

As one of the people responsible for that, the backstory is that at the time very long ago (2008?), it wasn't clear how to handle Unicode text in a cross-distro bugfree way while passing through a Haskell library like XMonad from a Prompt into X11 or the shell or applications, no one had the appetite to make an in-depth study of the various systems to figure out what exactly had to be done to handle ASCII & Unicode in a way that would be safe everywhere, and `encodeString` seemed to sorta work in most cases and be better that what came before. Since the Haskell and other ecosystems have gradually continued evolving (one hopes), it's possible that many Unicode-related issues have since quietly vanished, and XMonad could do something simpler and more correct than it does now; but one would need to investigate thoroughly on a couple systems before one could be sure it was safe to update `spawn` and all downstream users of `encodeString` etc, and no one has been willing to do so to the extent to make a change in the (generally very stable) HEAD. -- gwern https://www.gwern.net

Unfortunately I do not have access to many different systems - only the one I use now, Arch Linux. So I won't be able to test it thoroughly. But as a data point - on my machine Prompt returns UTF8 and that UTF8 can be safely passed into the executeFile call, without encodeString. Best regards, Platon Pronko On 2020-04-17 22:37, Gwern Branwen wrote:
As one of the people responsible for that, the backstory is that at the time very long ago (2008?), it wasn't clear how to handle Unicode text in a cross-distro bugfree way while passing through a Haskell library like XMonad from a Prompt into X11 or the shell or applications, no one had the appetite to make an in-depth study of the various systems to figure out what exactly had to be done to handle ASCII & Unicode in a way that would be safe everywhere, and `encodeString` seemed to sorta work in most cases and be better that what came before.
Since the Haskell and other ecosystems have gradually continued evolving (one hopes), it's possible that many Unicode-related issues have since quietly vanished, and XMonad could do something simpler and more correct than it does now; but one would need to investigate thoroughly on a couple systems before one could be sure it was safe to update `spawn` and all downstream users of `encodeString` etc, and no one has been willing to do so to the extent to make a change in the (generally very stable) HEAD.

In 2020 my inclination is to encode if it has codepoints > 255 in it and
leave it on the user otherwise; it's impossible to guess the right action.
On Fri, Apr 17, 2020, 16:05 Platon Pronko
Unfortunately I do not have access to many different systems - only the one I use now, Arch Linux. So I won't be able to test it thoroughly. But as a data point - on my machine Prompt returns UTF8 and that UTF8 can be safely passed into the executeFile call, without encodeString.
Best regards, Platon Pronko
On 2020-04-17 22:37, Gwern Branwen wrote:
As one of the people responsible for that, the backstory is that at the time very long ago (2008?), it wasn't clear how to handle Unicode text in a cross-distro bugfree way while passing through a Haskell library like XMonad from a Prompt into X11 or the shell or applications, no one had the appetite to make an in-depth study of the various systems to figure out what exactly had to be done to handle ASCII & Unicode in a way that would be safe everywhere, and `encodeString` seemed to sorta work in most cases and be better that what came before.
Since the Haskell and other ecosystems have gradually continued evolving (one hopes), it's possible that many Unicode-related issues have since quietly vanished, and XMonad could do something simpler and more correct than it does now; but one would need to investigate thoroughly on a couple systems before one could be sure it was safe to update `spawn` and all downstream users of `encodeString` etc, and no one has been willing to do so to the extent to make a change in the (generally very stable) HEAD.
_______________________________________________ xmonad mailing list xmonad@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
participants (3)
-
Brandon Allbery
-
Gwern Branwen
-
Platon Pronko