Folded long string literals and CPP?

It looks like there's some sort of syntax conflict between CPP and long string literals folded across multiple lines. Is there a way to have both CPP and folded long string literals? $ cat /tmp/foo.hs {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = "Hello\ \ World!" main :: IO () main = print hello ---- $ ghci /tmp/foo.hs GHCi, version 9.8.1: https://www.haskell.org/ghc/ :? for help [1 of 2] Compiling Main ( /tmp/foo.hs, interpreted ) /tmp/foo.hs:5:25: error: [GHC-21231] lexical error in string/character literal at character 'W' | 5 | hello = "Hello\ | ^ Failed, no modules loaded. λ> Leaving GHCi. When I run the input through "cpp -E" I get: {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = "Hello \ World!" main :: IO () main = print hello Clearly not what I want, so the subsequent lexical error from GHC is not surprising, but is there a workaround that allows folding long strings across lines and retaining the layout. Given that the CPP lexer also recognises quoted strings, it looks like a difficult to reconcile mismatch. There would need be some other sort of joiner understood by GHC, where each string fragment is fully enclosed in quotes. Would the below be acceptable? {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = ##"Hello"\ ##"World!" main :: IO () main = print hello This is turned by CPP into either (version-dependent): {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = ##"Hello" ##"World!" main :: IO () main = print hello or: {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = ##"Hello" ##"World!" main :: IO () main = print hello which (or some suitable variant?) GHC could then also recognise as multiple fragments of the same single string literal? -- Viktor.

On Mon, Nov 20, 2023 at 02:47:39PM -0500, Viktor Dukhovni wrote:
It looks like there's some sort of syntax conflict between CPP and long string literals folded across multiple lines. Is there a way to have both CPP and folded long string literals?
I may have found an acceptable work-around: {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = "Hello\ \ \ World!" main :: IO () main = print hello Is this trick "well known"? CPP turns the above into: {-# LANGUAGE CPP #-} module Main(main) where hello :: String hello = "Hello\ \ World!" main :: IO () main = print hello which is then valid Haskell syntax and yields the right value. -- Viktor.

On Mon, Nov 20, 2023 at 03:07:42PM -0500, Viktor Dukhovni wrote:
I may have found an acceptable work-around:
{-# LANGUAGE CPP #-} module Main(main) where
hello :: String hello = "Hello\ \ \ World!"
main :: IO () main = print hello
Is this trick "well known"?
I have now (should have earlier) checked the user guide and found: https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html?highli... and see that the it suggests adding trailing spaces. If that once worked, it no longer seems to. However adding a space and a trailing "\" does work. Perhaps the documentation can be updated? I do not know how portable my "\ \" sufix is across various systems. It works on a Fedora 36 system (Gnu toolchain), and with FreeBSD with both Clang and Gnu CPP. -- Viktor.

On Mon, Nov 20, 2023 at 03:57:40PM -0500, Viktor Dukhovni wrote:
On Mon, Nov 20, 2023 at 03:07:42PM -0500, Viktor Dukhovni wrote:
I may have found an acceptable work-around:
{-# LANGUAGE CPP #-} module Main(main) where
hello :: String hello = "Hello\ \ \ World!"
main :: IO () main = print hello
Is this trick "well known"?
I have now (should have earlier) checked the user guide and found:
https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html?highli...
and see that the it suggests adding trailing spaces. If that once worked, it no longer seems to. However adding a space and a trailing "\" does work. Perhaps the documentation can be updated?
I do not know how portable my "\ \" sufix is across various systems. It works on a Fedora 36 system (Gnu toolchain), and with FreeBSD with both Clang and Gnu CPP.
FWIW, GitHub CI shows that ending lines with "\ \" works on Ubuntu, MacOS and Windows. So it looks reasonably portable. The main downside is that the source code has to be changed when adding or removing "CPP", that would not have been the case with just "\ ", but that sadly does not work in practice. :-( -- Viktor.

That does sound like a bug. but it doesn't seem patchable using {-# LANGUAGE CPP #-} because it *is* a bug in -XCPP... :p Have you reported it anywhere? Cheers

On Tue, Nov 21, 2023 at 09:27:12AM +0000, Dan Dart wrote:
That does sound like a bug. but it doesn't seem patchable using {-# LANGUAGE CPP #-} because it *is* a bug in -XCPP... :p
Have you reported it anywhere?
Not yet reported outside this thread. Sure, morally speaking, this is a bug in "-XCPP", but in practice "-XCPP" unavoidably uses the actual C preprocessor, and is subject to its syntax idiosyncrasies. There's not a lot GHC can do once CPP joins the split lines and drops the trailing "\", and the documented trailing whitespace work-around no longer (if it once did) helps. So the only options appear to be to live with editing the code to to use a trailing "\ \" instead of just "\" when enablng -XCPP (and reverting when dropping "-XCPP", or else introduce a new syntax for multi-part string literals. Unless someone can suggest a better idea... -- Viktor.

A tiny change to Haskell, borrowed from Erlang, would solve the problem,
I think, with an increase in readability.
In Erlang, "c1...cn" ++ Pattern is a pattern, equivalent (in Haskell terms) to
('c1' :: 'c2' :: ... 'cn' :: Pattern). This means that a long string
can be written
as ("..." ++ "..." ++ ... ++ "..."), spread across as many lines as you need,
with each substring being complete on a line, in any context you want.
In an expression context, you can already do this, and that is easier to
read than strings broken across multiple lines. It makes the treatment of
leading white space completely obvious (if it's inside string quotes, it's
part of the text, if not, it's not). The only problem is that you cannot use
that construction where a pattern is needed (although if you are trying to
use a long string as a pattern I have to wonder why). So the tiny
extension is to allow an explicit concatenation of string literals as a string
literal in a pattern.
Right now,
hello :: String
hello = "Hello\ \
\ World!"
can and should be written
hello :: String
hello = "Hello " ++
"World!"
On Tue, 21 Nov 2023 at 23:53, Viktor Dukhovni
On Tue, Nov 21, 2023 at 09:27:12AM +0000, Dan Dart wrote:
That does sound like a bug. but it doesn't seem patchable using {-# LANGUAGE CPP #-} because it *is* a bug in -XCPP... :p
Have you reported it anywhere?
Not yet reported outside this thread. Sure, morally speaking, this is a bug in "-XCPP", but in practice "-XCPP" unavoidably uses the actual C preprocessor, and is subject to its syntax idiosyncrasies.
There's not a lot GHC can do once CPP joins the split lines and drops the trailing "\", and the documented trailing whitespace work-around no longer (if it once did) helps.
So the only options appear to be to live with editing the code to to use a trailing "\ \" instead of just "\" when enablng -XCPP (and reverting when dropping "-XCPP", or else introduce a new syntax for multi-part string literals.
Unless someone can suggest a better idea...
-- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I like it. I would be interesting (though perhaps overkill) to have a special operator which is like ++ but which can only be applied to list literals (not only strings), and which is guaranteed to be syntactic sugar for the concatenated list. One step fancier would be to allow it not just for literals but for any known-at-compile-time values. (So it’s syntactically an operator but in actuality more of a compiler directive, to evaluate the concatenation at compile time, or fail if that’s not possible.) Just thinking out loud. Jeff
On Nov 21, 2023, at 5:14 PM, Richard O'Keefe
wrote: A tiny change to Haskell, borrowed from Erlang, would solve the problem, I think, with an increase in readability.
In Erlang, "c1...cn" ++ Pattern is a pattern, equivalent (in Haskell terms) to ('c1' :: 'c2' :: ... 'cn' :: Pattern). This means that a long string can be written as ("..." ++ "..." ++ ... ++ "..."), spread across as many lines as you need, with each substring being complete on a line, in any context you want. In an expression context, you can already do this, and that is easier to read than strings broken across multiple lines. It makes the treatment of leading white space completely obvious (if it's inside string quotes, it's part of the text, if not, it's not). The only problem is that you cannot use that construction where a pattern is needed (although if you are trying to use a long string as a pattern I have to wonder why). So the tiny extension is to allow an explicit concatenation of string literals as a string literal in a pattern.
Right now,
hello :: String hello = "Hello\ \ \ World!"
can and should be written
hello :: String hello = "Hello " ++ "World!"
On Tue, 21 Nov 2023 at 23:53, Viktor Dukhovni
wrote: On Tue, Nov 21, 2023 at 09:27:12AM +0000, Dan Dart wrote:
That does sound like a bug. but it doesn't seem patchable using {-# LANGUAGE CPP #-} because it *is* a bug in -XCPP... :p
Have you reported it anywhere?
Not yet reported outside this thread. Sure, morally speaking, this is a bug in "-XCPP", but in practice "-XCPP" unavoidably uses the actual C preprocessor, and is subject to its syntax idiosyncrasies.
There's not a lot GHC can do once CPP joins the split lines and drops the trailing "\", and the documented trailing whitespace work-around no longer (if it once did) helps.
So the only options appear to be to live with editing the code to to use a trailing "\ \" instead of just "\" when enablng -XCPP (and reverting when dropping "-XCPP", or else introduce a new syntax for multi-part string literals.
Unless someone can suggest a better idea...
-- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Tue, Nov 21, 2023 at 05:36:13PM -0800, Jeff Clites via Haskell-Cafe wrote:
I like it. I would be interesting (though perhaps overkill) to have a special operator which is like ++ but which can only be applied to list literals (not only strings), and which is guaranteed to be syntactic sugar for the concatenated list. One step fancier would be to allow it not just for literals but for any known-at-compile-time values. (So it’s syntactically an operator but in actuality more of a compiler directive, to evaluate the concatenation at compile time, or fail if that’s not possible.)
Well, I guess we already have that in the form of Template Haskell splices, but that's rather a heavy hammer to swat this particular fly... {-# LANGUAGE TemplateHaskell #-} module Data.CompileTime(compileTimeString) where import Language.Haskell.TH.Syntax as TH compileTimeString :: TH.Quote m => String -> TH.Code m String compileTimeString str = let !lit = str in [|| lit ||] Which when imported into: {-# LANGUAGE CPP, TemplateHaskell #-} module Main(main) where import Data.CompileTime hello :: String hello = $$( compileTimeString $ "Hello" ++ " World!" ) main :: IO () main = print hello Produces the "Core" below: ... main :: IO () main = print ($fShowList $fShowChar) (unpackCString# "Hello World!"#) -- Viktor.

I switched to cpphs which doesn't have this problem.
On Wed, Nov 22, 2023, 10:42 PM Viktor Dukhovni
On Tue, Nov 21, 2023 at 05:36:13PM -0800, Jeff Clites via Haskell-Cafe wrote:
I like it. I would be interesting (though perhaps overkill) to have a special operator which is like ++ but which can only be applied to list literals (not only strings), and which is guaranteed to be syntactic sugar for the concatenated list. One step fancier would be to allow it not just for literals but for any known-at-compile-time values. (So it’s syntactically an operator but in actuality more of a compiler directive, to evaluate the concatenation at compile time, or fail if that’s not possible.)
Well, I guess we already have that in the form of Template Haskell splices, but that's rather a heavy hammer to swat this particular fly...
{-# LANGUAGE TemplateHaskell #-} module Data.CompileTime(compileTimeString) where import Language.Haskell.TH.Syntax as TH
compileTimeString :: TH.Quote m => String -> TH.Code m String compileTimeString str = let !lit = str in [|| lit ||]
Which when imported into:
{-# LANGUAGE CPP, TemplateHaskell #-} module Main(main) where import Data.CompileTime
hello :: String hello = $$( compileTimeString $ "Hello" ++ " World!" )
main :: IO () main = print hello
Produces the "Core" below:
... main :: IO () main = print ($fShowList $fShowChar) (unpackCString# "Hello World!"#)
-- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (5)
-
Dan Dart
-
Evan Laforge
-
Jeff Clites
-
Richard O'Keefe
-
Viktor Dukhovni