Distributing Haskell on a cluster

Hi all, I have posted the following question on stackoverflow, but so far I have not received an answer. http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluste... I have a piece of code that process files, processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO () This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm). Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new *Main.hs* containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using *threadDelay*) if the job is done. Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node? Best, Felipe

Bit of a whinger from left-field, but rather than deploying a Main script
and then using GHCi, have you considered compiling the program and shipping
that?
Before you veto the idea out of hand, statically compiled binaries are good
for being almost self-contained, and (depending on what you changed) and
they rsync well. And if that doesn't appeal, then consider instead building
the Haskell program dynamically; Hello World is only a couple kB; serious
program only a hundred or so.
Anyway, I know you're just looking to send a code fragment closure, but if
you're dealing with the input and output of the program through a stable
interface, then the program is the closure.
Just a thought.
AfC
On Mon, Mar 16, 2015 at 9:53 AM felipe zapata
Hi all, I have posted the following question on stackoverflow, but so far I have not received an answer.
http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluste...
I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new *Main.hs* containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using *threadDelay*) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Best,
Felipe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Anecdotal support for this idea: This is exactly how we distribute hadron[1]-based Hadoop MapReduce programs to cluster nodes at work. The compiled executable essentially ships itself to the nodes and recognizes the different environment when executed in that context. [1] hadron is a haskell hadoop streaming framework that came out of our work. It's on github and close to being released on hackage once the current dev branch is finalized/merged. In case it's helpful: https://github.com/soostone/hadron Oz
On Mar 15, 2015, at 8:06 PM, Andrew Cowie
wrote: Bit of a whinger from left-field, but rather than deploying a Main script and then using GHCi, have you considered compiling the program and shipping that?
Before you veto the idea out of hand, statically compiled binaries are good for being almost self-contained, and (depending on what you changed) and they rsync well. And if that doesn't appeal, then consider instead building the Haskell program dynamically; Hello World is only a couple kB; serious program only a hundred or so.
Anyway, I know you're just looking to send a code fragment closure, but if you're dealing with the input and output of the program through a stable interface, then the program is the closure.
Just a thought.
AfC
On Mon, Mar 16, 2015 at 9:53 AM felipe zapata
wrote: Hi all, I have posted the following question on stackoverflow, but so far I have not received an answer. http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluste... I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO () This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Best,
Felipe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

I haven't considered that idea, but it seems the natural solution.
Many thanks
On 15 March 2015 at 20:31, Ozgun Ataman
Anecdotal support for this idea: This is exactly how we distribute hadron[1]-based Hadoop MapReduce programs to cluster nodes at work. The compiled executable essentially ships itself to the nodes and recognizes the different environment when executed in that context.
[1] hadron is a haskell hadoop streaming framework that came out of our work. It's on github and close to being released on hackage once the current dev branch is finalized/merged. In case it's helpful: https://github.com/soostone/hadron
Oz
On Mar 15, 2015, at 8:06 PM, Andrew Cowie
wrote: Bit of a whinger from left-field, but rather than deploying a Main script and then using GHCi, have you considered compiling the program and shipping that?
Before you veto the idea out of hand, statically compiled binaries are good for being almost self-contained, and (depending on what you changed) and they rsync well. And if that doesn't appeal, then consider instead building the Haskell program dynamically; Hello World is only a couple kB; serious program only a hundred or so.
Anyway, I know you're just looking to send a code fragment closure, but if you're dealing with the input and output of the program through a stable interface, then the program is the closure.
Just a thought.
AfC
On Mon, Mar 16, 2015 at 9:53 AM felipe zapata
wrote: Hi all, I have posted the following question on stackoverflow, but so far I have not received an answer.
http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluste...
I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new *Main.hs* containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using *threadDelay*) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Best,
Felipe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

As usual, I could suggest a really crazy alternative: It could be possible
to design your code as an EDSL that can emit his own source code, in the
same way that web formlets emit HTML rendering.
In this case, instead of HTML rendering, the rendering would be the source
code of the closure that you want to execute remotely. Then you can compile
it at the emitting node or at the receiver.
The advantage is that you may remotely execute any routine coded using the
EDSL. Hiding the mechanism behind a few primitives of the EDSL.
The disadvantage is that you can not use IO routines with liftIO. You need
an special lifting mechanism that produces also the source code of the IO
routine.
I´m doing some research on this mechanism with the Transient monad.
2015-03-16 1:54 GMT+01:00 felipe zapata
I haven't considered that idea, but it seems the natural solution.
Many thanks
On 15 March 2015 at 20:31, Ozgun Ataman
wrote: Anecdotal support for this idea: This is exactly how we distribute hadron[1]-based Hadoop MapReduce programs to cluster nodes at work. The compiled executable essentially ships itself to the nodes and recognizes the different environment when executed in that context.
[1] hadron is a haskell hadoop streaming framework that came out of our work. It's on github and close to being released on hackage once the current dev branch is finalized/merged. In case it's helpful: https://github.com/soostone/hadron
Oz
On Mar 15, 2015, at 8:06 PM, Andrew Cowie
wrote: Bit of a whinger from left-field, but rather than deploying a Main script and then using GHCi, have you considered compiling the program and shipping that?
Before you veto the idea out of hand, statically compiled binaries are good for being almost self-contained, and (depending on what you changed) and they rsync well. And if that doesn't appeal, then consider instead building the Haskell program dynamically; Hello World is only a couple kB; serious program only a hundred or so.
Anyway, I know you're just looking to send a code fragment closure, but if you're dealing with the input and output of the program through a stable interface, then the program is the closure.
Just a thought.
AfC
On Mon, Mar 16, 2015 at 9:53 AM felipe zapata
wrote: Hi all, I have posted the following question on stackoverflow, but so far I have not received an answer.
http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluste...
I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new *Main.hs* containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using *threadDelay*) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Best,
Felipe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- Alberto.
participants (4)
-
Alberto G. Corona
-
Andrew Cowie
-
felipe zapata
-
Ozgun Ataman