A 3 line program --> Reid, Don, Daniel

Thank you for your help. I am sure that you are right as to the cause of the problem. However, I do not know what I should do to solve it. The Haskell program is generated by LaTeX macros. It is useless to import System.IO.UTF8. I did it. The system keeps producing the same error. What I need is to configure GHC to accept input from any editor, or text generating tool (like TeX). I suppose it is not very difficult to do it, since all other languages have it. I mean, Haskell is the only language where I cannot type a common word like "façade" without running into trouble. I will send you a slightly more complete program, so you can analyse the problem. One has two programs in Tex, and an example. The first file is haskell.gen and it defines the macro \haskell{} % Store in file haskell.gen % Last change 2007-05-23 \newcount\evalFilecount \evalFilecount=0 {\gdef\haskellSetup{% \csname newwrite\endcsname\progPort \immediate\openout\progPort H\jobname.hs % \immediate\write\progPort{module H\jobname}% \immediate\write\progPort{where}% \newlinechar=`\^^M% \newlinechar=`\^^M% }% } \newcommand{\haskell}{\begingroup \ifx\progPort\UNDEFINED \haskellSetup\fi% \catcode`\^^M=12 \haskellGen } \newcommand{\haskellGen}[1]{% \immediate\write\progPort{#1}% \endgroup } %end of file haskell.gen The second file is hask.ell and it defines \eval{}. It is slightly more complex than the previous one. Still, it is very simple. % Store in file hask.ell % Last change 2007-05-23 \newcount\evalFilecount \evalFilecount=0 {\gdef\evalSetup{% \csname newwrite\endcsname\outPort % Port name \immediate\openout\outPort \jobname.hs % \newlinechar=`\^^M% \immediate\write\outPort{import H\jobname}% \newlinechar=`\^^M% \immediate\write\outPort{import IO}% \newlinechar=`\^^M% \immediate\write\outPort{main= do }% \newlinechar=`\^^M% }% } \newcommand{\eval}{\begingroup \ifx\outPort\UNDEFINED \evalSetup\fi \global\advance\evalFilecount by 1 \edef\evalAuxFile{\jobname-G-H-C\the\evalFilecount}% \immediate\write\outPort { outh <- openFile "\evalAuxFile.ghc" { WriteMode;} }% {\immediate\openin0=\evalAuxFile.ghc \ifeof0 \immediate\closein0 \else \input \evalAuxFile.ghc \fi}% \catcode`\^^M=12 \evalBody } \newcommand{\evalBody}[1]{% \immediate\write\outPort{#1}% \immediate\write\outPort{ hClose outh;}% \endgroup } %end of file hask.ell Below, there is an example of how to use the above macros. Store it in file tudin.tex. \documentclass[12pt]{article} \usepackage[latin1]{inputenc} \usepackage[brazil]{babel} \usepackage{makeidx} \usepackage{wrapfig} \usepackage{pictexwd} \title{Calling Haskell from \LaTeX} \author{Philippos Apolinarius} \date{} \begin{document} \maketitle \input haskell.gen \input hask.ell % load \eval \eval{ hPutStrLn outh "Hello, facade!" } \eval{ hPutStrLn outh (show (fat 6)) } \haskell{ fat n | n<1= 1 fat n= n*fat(n-1) } \end{document} In order to use these examples, do the following: 1 --- Apply latex to tudin.tex: C:\hastex> latex tudin.tex 2 --- You have generated file tudin.hs and Htudin.hs. Execute tudin.hs: C:\hastex> runghc tudin.hs This action will generate two files: tudin-G-H-C1.ghc, and tudin-G-H-C2.ghc. There is one file for each instance of \eval{}. 3 --- If you run latex tudin.tex again, it will load these two files, and insert the results of the Haskell programs into the resulting pdf or dvi files. C:\hastex> latex tudin.tex My friend has a lot of complex Clean libraries that she uses to generate postscript images of her designs. The trouble is that she needs to compile Clean souce code generated by the two LaTeX macros. SInce Clean needs two files per module (implementation module and definition module), the two LaTeX macros are quite complex. Clean looks a lot like Haskell. Therefore, I believe that she will not have any difficulty in translating the code to Haskell. It is true that input/output is different in Clean and Haskell. However, input/output can be centered in the main function. Unhappily she needs to write all those French, Swiss, Italian and Spanish names, like façade, Antoní Gaudi, Xenákes, etc. BTW, if you discover how to use strings in Haskell, one could write Haskell-café, instead of Haskell-cafe. __________________________________________________________________ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/

Philippos, Doesn't the line below means everything to be interpreted will be considered as latin1? \usepackage[latin1]{inputenc} Unicode as UTF-8 won't fit here. I would sugest transfering the problem to latex. Why not to print 'fa\c cade' instead of 'façade'? It's 7 bits and will never fail. Best, Maurício
Thank you for your help. I am sure that you are right as to the cause of the problem. However, I do not know what I should do to solve it. The Haskell program is generated by LaTeX macros. (...)

Hi, Mauricio.
Since LaTeX reserves inverted slashes for internal use, I cannot tamper with it. The idea is to transfer the problem to Haskell (or Clean, or Scheme), not to LaTeX. After all, it is easier to program in Haskell, Clean or Scheme than in LaTeX. Anyway, here is what happens if I follow your suggestion:
-- File tudin.hs
import Htudin
import IO
import That
main= do
outh <- openFile "tudin-G-H-C1.ghc" WriteMode;
hPutStrLn outh (acc "Hello, fa\unhbox \voidb@x \setbox \z@ \hbox {c}{\lineskiplimit -\maxdimen \unhbox \voidb@x \vtop {\baselineskip \z@skip \lineskip .25ex\everycr {}\tabskip \z@skip \halign {##\crcr \unhbox \z@ \crcr \hskip \hideskip \char 24\hskip \hideskip \crcr }}}ade!")
hClose outh;
outh <- openFile "tudin-G-H-C2.ghc" WriteMode;
hPutStrLn outh (show (fat 6))
hClose outh;
As you can see, LaTeX sends its macro expansion to the Haskell file. Another possibility is to change the classification of \ (reverse slash). Here is how to do it:
\catcode`\\=11
Now the reverse slash is a normal char. Of course, I need to change the classification of every accent, tilde, cedil, umlaut, etc. I cannot change the grave, because I need it to make the change itself. Users don't like this kind of solution. In any case, here are the programs with the characters modified (I will post a more complete library in my site; then I will add a few graphical functions so people can understand the problem). Compilation is as before:
1 --- C:\hastex> latex tudin.tex
2 --- C:\hastex> runghc tudin.hs
3 --- C:\hastex> latex tudin.tex
It is possible to use pdflatex instead of latex. By the way, Haskell cannot print your name, since it has an acute i :-) It would be great if people could solve this problem.
% Store in file hask.ell
% Last change 2007-05-23
\newcount\evalFilecount
\evalFilecount=0
{\gdef\evalSetup{%
\csname newwrite\endcsname\outPort % Port name
\immediate\openout\outPort \jobname.hs %
\newlinechar=`\^^M%
\immediate\write\outPort{import H\jobname}%
\newlinechar=`\^^M%
\immediate\write\outPort{import IO}%
\newlinechar=`\^^M%
\immediate\write\outPort{main= do }%
\newlinechar=`\^^M%
}%
}
\newcommand{\eval}{\begingroup
\ifx\outPort\UNDEFINED \evalSetup\fi
\global\advance\evalFilecount by 1
\edef\evalAuxFile{\jobname-G-H-C\the\evalFilecount}%
\immediate\write\outPort
{ outh <- openFile "\evalAuxFile.ghc" { WriteMode;} }%
{\immediate\openin0=\evalAuxFile.ghc
\ifeof0 \immediate\closein0
\else \input \evalAuxFile.ghc \fi}%
\catcode`\^^M=12
\catcode`\\=11
\catcode`\~=11
\catcode`\'=11
\catcode`\`=11
\catcode`\^=11
\evalBody
}
\newcommand{\evalBody}[1]{%
\immediate\write\outPort{#1}%
\immediate\write\outPort{ hClose outh;}%
\endgroup
}
%end of file hask.ell
% Store in file haskell.gen
% Last change 2007-05-23
\newcount\evalFilecount
\evalFilecount=0
{\gdef\haskellSetup{%
\csname newwrite\endcsname\progPort
\immediate\openout\progPort H\jobname.hs %
\immediate\write\progPort{module H\jobname}%
\immediate\write\progPort{where}%
\newlinechar=`\^^M%
\newlinechar=`\^^M%
}%
}
\newcommand{\haskell}{\begingroup
\ifx\progPort\UNDEFINED \haskellSetup\fi%
\catcode`\^^M=12
\haskellGen
}
\newcommand{\haskellGen}[1]{%
\immediate\write\progPort{#1}%
\endgroup
}
%end of file haskell.gen
% File: tudin.tex
\documentclass[12pt]{article}
\usepackage[latin1]{inputenc}
\usepackage[brazil]{babel}
\usepackage{makeidx}
\usepackage{wrapfig}
\usepackage{pictexwd}
\newcommand{\cc}{\c{c}}
\title{Calling Haskell from \LaTeX}
\author{Philippos Apolinarius}
\date{}
\begin{document}
\maketitle
\input haskell.gen
\input hask.ell % load \eval
\eval{ hPutStrLn outh "Hello, fa\\c cade!"
}
\eval{ hPutStrLn outh "Il est l\\`a bas!"
}
\eval{ hPutStrLn outh "Raison d' \\^etre!"
}
\eval{ hPutStrLn outh "Votre toast, je peux vous le rendre, Se\\~nor!"
}
\eval{ hPutStrLn outh (show (fat 6))
}
\haskell{
fat n | n<1= 1
fat n= n*fat(n-1)
}
\end{document}
--- On Sat, 10/24/09, Maurício CA
Thank you for your help. I am sure that you are right as to the cause of the problem. However, I do not know what I should do to solve it. The Haskell program is generated by LaTeX macros. (...)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe __________________________________________________________________ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/

On Sat, Oct 24, 2009 at 09:17:26PM -0700, Philippos Apolinarius wrote:
Hi, Mauricio. ... It is possible to use pdflatex instead of latex. By the way, Haskell cannot print your name, since it has an acute i :-) It would be great if people could solve this problem.
This problem is solved! Especially in the upcoming GHC 6.12.1! Just use UTF-8 everywhere! Regards, Reid Barton

On 2009-10-24 19:03 -0700 (Sat), Philippos Apolinarius wrote:
However, I do not know what I should do to solve it.
I am not clear on exactly what your requirements are as far as character encodings. But you need to understand character encodings if you're going to be using non-ASCII ones. One simple solution, if you have an ISO-8859-1 ("latin1") file and you need a UTF-8 file is: iconv -f ISO-8859-1 -t UTF-8 input.hs output.hs
I mean, Haskell is the only language where I cannot type a common word like "façade" without running into trouble.
Actually, you would be having the exact same issues with Java; in UTF-8
mode it would also choke on Latin-1. I suspect that with your particular
implementation of Clean, you just happen to be generating the character
encoding that it uses as the default for input. Blaming Haskell for this
"problem" is quite unfair.
(If all of this UTF-8 stuff seems annoying to you, consider that in
ISO-8859-1 it's not possible to express the simplest Japanese word.
So moving from ISO-8859-1 to UTF-8 is done in the same spirit that
we long ago started using ISO-8859-1 instead of ASCII, so that you
could type "façade" instead of "facade.")
cjs
--
Curt Sampson

On Oct 25, 2009, at 5:01 PM, Curt Sampson wrote:
Actually, you would be having the exact same issues with Java; in UTF-8 mode it would also choke on Latin-1.
Yes, but from the 'javac' man-page: -encoding encoding Sets the source file encoding name, such as EUCJIS/SJIS/ISO8859-1/UTF8. If -encoding is not specified, the platform default converter is used. The corresponding part of the GHC documentation says GHC assumes that source files are ASCII or UTF-8 only, other encodings are not recognised. However, invalid UTF-8 sequences will be ignored in comments, so it is possible to use other encodings such as Latin-1, as long as the non-comment source code is ASCII only. There's no obvious reason why GHC couldn't support any source encoding that the host's iconv() supports.
Blaming Haskell for this "problem" is quite unfair.
It is perfectly fair. The problem is not that the original user isn't telling GHC what the encoding is, but that GHC cannot be told. A javac-like -encoding switch on the command line would meet the original need.
(If all of this UTF-8 stuff seems annoying to you, consider that in ISO-8859-1 it's not possible to express the simplest Japanese word.
And why, exactly, should someone who has no Japanese words to express even care? You have explained why UTF-8 is a good *default*; that does not make choosing it as the *only* option a good idea.

On Oct 26, 2009, at 20:12 , Richard O'Keefe wrote:
On Oct 25, 2009, at 5:01 PM, Curt Sampson wrote: The corresponding part of the GHC documentation says
GHC assumes that source files are ASCII or UTF-8 only, other encodings are not recognised. However, invalid UTF-8 sequences will be ignored in comments, so it is possible to use other encodings such as Latin-1, as long as the non-comment source code is ASCII only.
There's no obvious reason why GHC couldn't support any source encoding that the host's iconv() supports.
That would be the Haskell98 Report: Haskell uses the Unicode [11] character set. However, source programs are currently biased toward the ASCII character set used in earlier versions of Haskell . This syntax depends on properties of the Unicode characters as defined by the Unicode consortium. Haskell compilers are expected to make use of new versions of Unicode as they are made available. So yes, it's reasonable to "blame" the language (spec). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

"Brandon" == Brandon S Allbery KF8NH
writes:
Brandon> That would be the Haskell98 Report: Haskell uses the Brandon> Unicode [11] character set. However, source programs are Brandon> currently biased toward the ASCII character set used in Brandon> earlier versions of Haskell . Brandon> So yes, it's reasonable to "blame" the language (spec). Note also that it mentions the Unicode character set, not a particular Unicode encoding scheme. To me that implies that an implementation must support all 7 encoding schemes, not just UTF-8. At which point you probably want to make use of iconv, so you might as well support all iconv-supported encodings. -- Colin Adams Preston Lancashire

Colin Paul Adams
Brandon> So yes, it's reasonable to "blame" the language (spec).
On the other hand, the sooner users can get moving to utf-8, the sooner we can get eliminate these kinds of problems.
Note also that it mentions the Unicode character set, not a particular Unicode encoding scheme.
To me that implies that an implementation must support all 7 encoding schemes, not just UTF-8.
...but not latin1, which appeared to be the problem here.
At which point you probably want to make use of iconv, so you might as well support all iconv-supported encodings.
Interestingly, Wikipedia [0] says that "Unicode-aware programs are required to display, print and manipulate [UTF-32 and -16]", although no source is provided for this requirement. But until somebody actually has Haskell sources in a Unicode-encoding different from utf-8, I'd much prefer developers to spend their time on something more useful. (And the workaround is just one line in a Makefile, isn't it?) -k [0] http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings -- If I haven't seen further, it is by standing in the footprints of giants

Hello Ketil! Tue, Oct 27, 2009 at 09:06:50AM +0100 you wrote:
At which point you probably want to make use of iconv, so you might as well support all iconv-supported encodings.
Interestingly, Wikipedia [0] says that "Unicode-aware programs are required to display, print and manipulate [UTF-32 and -16]", although no source is provided for this requirement.
I believe that `required' there means that you have to use a Unicode-aware program (and can't use a legacy program avan for plain English text, in contrast to UTF-8), not that every Unicode-aware program has to actually handle the encodings. Best regards, -- DoubleF No virus detected in this message. Ehrm, wait a minute... /kernel: pid 56921 (antivirus), uid 32000: exited on signal 9 Oh yes, no virus:)

In a private email to Ketil Malde, I said that Ocaml programmers use the preprocessor to solve the problem of character encoding:
ocamlopt -pp myfilter.exe myprogram.ml -o myoutput.exe
I wonder whether a similar solution could be used with Haskell. I am new to Haskell, but I suppose that Haskell may accept something like
ghc -pgmF myfilter.exe myprogram.hs --make
If the answer is yes, what should I substitute for myfilter.exe?
--- On Tue, 10/27/09, Ketil Malde
Brandon> So yes, it's reasonable to "blame" the language (spec).
On the other hand, the sooner users can get moving to utf-8, the sooner we can get eliminate these kinds of problems.
Note also that it mentions the Unicode character set, not a particular Unicode encoding scheme.
To me that implies that an implementation must support all 7 encoding schemes, not just UTF-8.
...but not latin1, which appeared to be the problem here.
At which point you probably want to make use of iconv, so you might as well support all iconv-supported encodings.
Interestingly, Wikipedia [0] says that "Unicode-aware programs are required to display, print and manipulate [UTF-32 and -16]", although no source is provided for this requirement. But until somebody actually has Haskell sources in a Unicode-encoding different from utf-8, I'd much prefer developers to spend their time on something more useful. (And the workaround is just one line in a Makefile, isn't it?) -k [0] http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe __________________________________________________________________ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/

On Tue, Oct 27, 2009 at 3:57 PM, Philippos Apolinarius
In a private email to Ketil Malde, I said that Ocaml programmers use the preprocessor to solve the problem of character encoding:
ocamlopt -pp myfilter.exe myprogram.ml -o myoutput.exe
I wonder whether a similar solution could be used with Haskell. I am new to Haskell, but I suppose that Haskell may accept something like
ghc -pgmF myfilter.exe myprogram.hs --make
If the answer is yes, what should I substitute for myfilter.exe?
Converting a non-utf-8 input file to utf-8? There's no obvious reason it shouldn't work, and you could use recode for the filter. However, wouldn't it be much better to just generate utf-8 in the first place? I find it hard to believe that it's really as hard as all that. -- Svein Ove Aas

Philippos Apolinarius
Thank you for your help. I am sure that you are right as to the cause of the problem. However, I do not know what I should do to solve it. The Haskell program is generated by LaTeX macros. It is useless to import System.IO.UTF8. I did it.
Are you sure? It works for me (GHC 6.10 as shipped with Ubuntu 9.10).
module Main where main = putStrLn "façade"
This will, with source in UTF8, output 0xe7 for the c-cedilla, which I believe is ISO-8859-1. Changing the source to
module Main where import System.IO.UTF8 as U main = U.putStrLn "façade"
Gives the following expected, and utf-8-correct output: % ./utf | od -t a -t x1 0000000 f a C ' a d e nl 66 61 c3 a7 61 64 65 0a 0000010 -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (10)
-
Brandon S. Allbery KF8NH
-
Colin Paul Adams
-
Curt Sampson
-
Ketil Malde
-
Maurício CA
-
Philippos Apolinarius
-
Reid Barton
-
Richard O'Keefe
-
Sergey Zaharchenko
-
Svein Ove Aas