Hello,
 
I would like to announce the release of Karps, an experimental Haskell frontend to Spark Dataframes and datasets. Apache Spark [1] is a popular framework for distributed programming, which comes with different APIs. The excellent Sparkle project [2] from Tweag integrates well with Spark's low-level ("RDD") API, while Karps focuses on the more recent dataframe and dataset API only. In that sense, both projects are complementary in their goals and scope.
 
What can you do with it? So far, simple queries such as number manipulation, importing lists of data, reading json files, etc. To facilitate debugging, it integrates with Google's Tensorboard [3] to provide rich visualizations of the dataflow. In addition, thanks to Haskell, it includes a full-program analyzer and optimizer that can automate common tasks such as cache management, query optimizations, etc. Some IHaskell notebooks give a flavor of what is possible, see the link in the github page:
 
https://github.com/krapsh/kraps-haskell
 
https://hackage.haskell.org/package/karps-0.2.0.0
 
The main motivation of the author (a Spark developer) is that writing Spark frontends for new programming languages is very hard. Karps explores a language-agnostic API that is easy enough to build simple frontends (javascript, julia), yet allows Spark to perform rich optimizations under the hood. If you want to know more, a talk will take place at the San Francisco Spark Users meetup on this topic.
 
Since this is my first Haskell project (I wrote my first line of Haskell nine months ago), I will appreciate all feedback regarding form and substance. For example, some questions still puzzle me:
- how to integrate a style checker (I use atom+ghc-mod)
- what are the best practices for integration testing?
- can I have tests that depend on internal modules, yet hide these internal module from the haddock documentation?
 
Thank you for your feedback
 
[1] http://spark.apache.org/
[2] https://github.com/tweag/sparkle
[3] https://www.tensorflow.org/get_started/summaries_and_tensorboard