[Haskell-begin] Morphing Endo (ICFP Contest 2007)

Did any of you tried to do a Haskell implementation of the ICFPC 2007 problem? I was thinking of using it as a learning exercise, but I am afraid of the stack. My approach is: 1) use Data.ByteString.Lazy.Char8 to read the contents of the DNA file 2) create a recursive function process::ByteString -> a that will call itself. I have a few problems: a) the DNA is 8MB long. How can I ensure the stack will hold a recursive call? b) there is an "abnormal ending" function called finish that is called anywhere in the code. Is it a good approach to return Empty to end processing? c) should I go "monadic", keeping the dna on a state monad? Thanks -- Rafael Gustavo da Cunha Pereira Pinto Electronic Engineer, MSc.

For everybody's reference, here is summary information of the exercise
(each quoted paragraph starts with a double-quote, and the entire
section ends with a single double-quote):
The 10th ICFP Programming Contest: July 20 - 23, 2007, organized by
Utrecht
http://save-endo.cs.uu.nl/
Problem ("Task") Description:
http://save-endo.cs.uu.nl/Endo.pdf
Background of Problem (from "[Chapter] 1 Background" of the
above-mentioned "Problem ('Task') Description":
"Department of Information and Computing Sciences
Utrecht University
"Tenth Interstellar Contest on Fuun Programming
"Morph Endo!
"Task Description
"Endo is an alien life form, belonging to the species of the Fuun.
Endo needs your help! Earth’s environmental conditions can be harsh
for a life form not properly adapted. Endo had the bad luck of being
dropped on Earth by an Interstellar Garbage Collector. Both the life
form and its faithful space ship Arrow were severely hurt in the
crash, and even worse, after leaving the damaged space craft Endo was
hit by a cargo container that was also dropped by the Garbage
Collector.
"Endo is now in serious trouble. It cannot survive on planet Earth in
its present form, and Arrow is running low on power. According to
Arrow, who was the one to have contacted us, the only hope for Endo is
to change its DNA and thereby adapt it to the conditions of our
planet. To this end, Arrow has been able to come up with a form in
which Endo will survive. Unfortunately, given its current condition,
Arrow lacks the resources for coming up with proper DNA modifications
itself.
"Your task is therefore to help us find such a DNA sequence within 72
hours. Shortly thereafter, Arrow’s power will run out for good, and
since only Arrow can perform the genetic modification on Endo, this
would mean Endo’s definite end.
"Admittedly, we are partially responsible for the current state of
urgency. It took us a long time before realizing the nature of
Arrow’s emergency message and decoding the information that was
provided to us. We are sorry but nevertheless hope that you, the ICFP
programming contest audience, have the proper expertise to solve this
problem within the harsh time constraints.
"Fortunately, there’s also good news. When we weren’t yet aware of
the complete story of Endo and its struggle for life, we have been
working on decoding the specification of Fuun DNA. We now know that
Fuun DNA works significantly different from human DNA, and that the
process of DNA resequencing can be properly described as an algorithm.
The main purpose of this document is to describe this algorithm."
To solve the above task, you are given the following tools:
* Endo's DNA string (MD5 hash c496125ef1d22a61cb86aeb1a02c4092)
http://save-endo.cs.uu.nl/endo.zip
* the source picture
http://save-endo.cs.uu.nl/source.png
* the target picture
http://save-endo.cs.uu.nl/target.png
The actual description of the problem is 20 pages long, which is too
long to quote here.
Correct me if I'm wrong, but because the prefix should not be too
long, and chemical process of modifying Endo with the prefix should
not consume too much energy, this appears to be an example of
constraint programming. Since constraint programming can often be
carried out by constraint logic programming, I might suggest a logic
programming approach for this problem.
According to "Applications and libraries/Compilers and interpreters -
HaskellWiki"
(http://haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_inte...),
CHR (Constraint Handling Rules) is "A concurrent committed-choice
constraint logic programming language, implemented using GHC's
software transactional memory." The URL listed for the associated PS
file is http://www.comp.nus.edu.sg/~sulzmann/manuscript/chr-stm.ps,
but when I click on the link, I get a "403 Forbidden" error.
However, I was able to find the following associated Web site with
Google:
Constraint Handling Rules (CHR)
http://www.cs.kuleuven.ac.be/~dtai/projects/CHR/
-- Benjamin L. Russell
On Wed, 23 Jul 2008 22:11:10 -0300, "Rafael Gustavo da Cunha Pereira
Pinto"
Did any of you tried to do a Haskell implementation of the ICFPC 2007 problem?
I was thinking of using it as a learning exercise, but I am afraid of the stack. My approach is:
1) use Data.ByteString.Lazy.Char8 to read the contents of the DNA file 2) create a recursive function process::ByteString -> a that will call itself.
I have a few problems:
a) the DNA is 8MB long. How can I ensure the stack will hold a recursive call? b) there is an "abnormal ending" function called finish that is called anywhere in the code. Is it a good approach to return Empty to end processing? c) should I go "monadic", keeping the dna on a state monad?
Thanks

On Thu, 24 Jul 2008 11:46:34 +0900, Benjamin L.Russell
[...]
However, I was able to find the following associated Web site with Google:
Constraint Handling Rules (CHR) http://www.cs.kuleuven.ac.be/~dtai/projects/CHR/
The above-mentioned previous reference was just for the top page of CHR. A more refined search revealed a page with Haskell-specific library downloads for CHR, as follows: DOWNLOAD IMPLEMENTATIONS - Constraint Handling Rules (CHR) http://www.cs.kuleuven.be/~dtai/projects/CHR/chr-impl.html There, the following libraries for Haskell are listed: * HaskellCHR http://www.cs.mu.oz.au/~gjd/haskellchr/ * STM-based CHR implementation, by Michael Stahl http://www.cs.kuleuven.be/~dtai/projects/CHR/systems/stmchr-0.1.tar.gz * CCHR: STM-based CHR implementation by Lam and Sulzmann http://taichi.ddns.comp.nus.edu.sg/taichiwiki/CCHR I have replaced the previously dead link for CHR at "Applications and libraries/Compilers and interpreters - HaskellWiki" (http://haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_inte...) with the above-mentioned live one, and added links/descriptions for the above-mentioned STM-based CHR implementation and for CCHR. If I come up with any more specific information or ideas that seem relevant for solving the problem, I'll post it in this thread. -- Benjamin L. Russell

Benjamin L.Russell wrote:
Correct me if I'm wrong, but because the prefix should not be too long, and chemical process of modifying Endo with the prefix should not consume too much energy, this appears to be an example of constraint programming. Since constraint programming can often be carried out by constraint logic programming, I might suggest a logic programming approach for this problem.
Note that you have to execute the modified Endo to find out how many points a prefix will be awarded, which takes several seconds to several hours depending on your implementation. Since DNA is Turing complete, you cannot hope to evaluate a modified Endo without executing it. I therefore suggest to implement a DNA interpreter to produce RNA from DNA, and a RNA interpreter which produces pictures from RNA, and then to look at these pictures for clues how to proceed. Tillmann

Still on the ICFPC 2007 topic I am curious about one thing. If I read the file applying hGetContents to a ByteString (Lazy or Strict) or a String, it seems to read much faster than the version where I construct a sequence. main = do (arg1:args)<-getArgs hIn <- openFile arg1 ReadMode c <-BL.hGetContents hIn --Really Fast let dna = c r<-return (process dna) print (show (r)) main = do (arg1:args)<-getArgs hIn <- openFile arg1 ReadMode c <-hGetContents hIn let dna = fromList c --Kind of slow r<-return (process dna) print (show (r)) I think the "fromList" overhead will be compensated by the O(log(n)) functions on a Seq, against the O(n) counterparts on the strings. What are your considerations about using Data.Sequence?

On Wed, 23 Jul 2008 22:11:10 -0300, "Rafael Gustavo da Cunha Pereira
Pinto"
[...]
I have a few problems:
a) the DNA is 8MB long. How can I ensure the stack will hold a recursive call?
Forgive me if I'm wrong, but on a hunch, but this sounds like an issue that could possibly be solved using tail recursion. For example, according to "Recursion in a monad - HaskellWiki" (http://haskell.org/haskellwiki/Recursion_in_a_monad), here is a monadic do-block reads 'n' lines from stdin, using a linear recursive process: main = f 3 f 0 = return [] f n = do v <- getLine vs <- f (n-1) return $! v : vs This runs as follows: $ runhaskell A.hs 1 2 3 ["1","2","3"] Rewriting the code to make it use a linear iterative process (using tail recursion) results in the following code: f 0 acc = return (reverse acc) f n acc = do v <- getLine f (n-1) (v : acc) Here, "acc" stands for an accumulator, which is a register (here, a variable simulating a register) to store temporary values. This rewrite enables the process to use constant stack space. Using this idea, it may be possible to rewrite your recursive function process::ByteString -> a to use constant stack space. -- Benjamin L. Russell
participants (3)
-
Benjamin L.Russell
-
Rafael Gustavo da Cunha Pereira Pinto
-
Tillmann Rendel