
Hi, I am writing a Web application using HAppS. As all HAppS apps, it represents its internal state as a Haskell term (HAppS automagically provides persistence and transactions). It is a neat and efficient solution, you can write your data model entirely in Haskell and, at least for read-only transactions (queries) it will be operate as fast as possible as all data is in memory (if your transactions modify the application state, the transactions has to be recorded on disk to make it persistent, but this is pretty fast too). One major component, however seem to be missing, if we are effectively using Haskell as an in-memory database where is the "SQL for Haskell": a generic query language for Haskell terms? There are three basic functions that every web app has to provide, and all of them could be provided by a generic "Haskell SQL": -- query the application state -- transform (possibly monadically) the application state : the result of the query is the new state -- access control: what an user can see is what is returned by an internal access control query The availabilty of such a language would be a major boost for Haskell-based web applications as every application could be accessed via the same API, the only difference being the underlying application-specific data model. So my question is: what ready-made solutions are there in this space, if any? And if there are none, how would you proceed to design/implement such a language? The basic requirements, in decreasing order of importance, are: -- Safe, it must be possible to guarantee that a query: --- cannot cause a system crash --- completes by a fixed time of time --- uses a 'reasonable' amount of space --- cannot perform any unsafe operation (IO, or any unallowed read/write of the application state) -- Expressive (simple queries should be simple, complex queries should be possible) -- Simple to implement -- Efficient: --- Repeated queries should be executed efficiently time-wise (it is acceptable for queries to be executed inefficiently the first time) and all should be space-efficient, so it should not do unnecessary copying. -- User friendly: --- Simple to use for non-haskeller --- Short queries Ah, I almost forgot, it should also be able to make a good espresso. The problem can be broken in two parts: 1) How to implement generic queries on nested terms in Haskell? 2) How to map the queries, written as a string, to the internal Haskell query Regarding the first point, I am aware of with the following options: - SYB (Data.Generics..) - Oleg's Zipper - (Nested) list comprehensions (that are being extended with SQL-like order by and group by operators) Being rather new to Haskell all these options are rather unfamiliar so I would appreciate any advice on what should be preferred and why. Regarding the second point: The simplest solution would be to avoid the problem entirely by using Haskell directly as the query language. This is the LambdaBot way: queries are Haskell expression, compiled in a limited environment (a module with a fixed set of imports, no TH). Lambdabot avoids problems by executing the expression on a separate process in a OS-enforced sandbox that can be as restrictive as required (especially using something like SELinux). However, to get the query to execute efficiently it would probably have to be executed in a GHC thread and I am not sure how safe that would be. Looking at the discussion at http://haskell.org/haskellwiki/Safely_running_untrusted_Haskell_code it seems clear that there are many open issues. For example, how would I enforce limits on the space used by the query? So, it would probably be better to define a separate query language that is less expressive but more controllable than full Haskell, but what form should that take? Any suggestion/tip/reference is very welcome, titto

Hi
Regarding the first point, I am aware of with the following options: - SYB (Data.Generics..)
You may also want to take a look at Uniplate: http://www-users.cs.york.ac.uk/~ndm/uniplate/ That (or SYB) should take care of your query/transform issues, and the ACL stuff can be layered on top of that. I have no idea how you'd manage the space/time requirements though. Thanks Neil

On Saturday 23 June 2007 13:52:27 Neil Mitchell wrote:
Hi
Regarding the first point, I am aware of with the following options: - SYB (Data.Generics..) You may also want to take a look at Uniplate: http://www-users.cs.york.ac.uk/~ndm/uniplate/
Many thanks Neil.
That (or SYB) should take care of your query/transform issues, and the ACL stuff can be layered on top of that. I have no idea how you'd manage the space/time requirements though.
Talking about space requirements, does Uniplate (or SYB for that matter) always perform a full copy of the traversed structure ? Best, titto

Titto, Have you looked at HAppS.DBMS.IxSet? Right now it provides a generic way to query indexed sets. If you want to take a shot at making the queries serializable, I don't think it would be that difficult (but I have not tried so YMMV). -Alex- Pasqualino 'Titto' Assini wrote:
Hi,
I am writing a Web application using HAppS.
As all HAppS apps, it represents its internal state as a Haskell term (HAppS automagically provides persistence and transactions).
It is a neat and efficient solution, you can write your data model entirely in Haskell and, at least for read-only transactions (queries) it will be operate as fast as possible as all data is in memory (if your transactions modify the application state, the transactions has to be recorded on disk to make it persistent, but this is pretty fast too).
One major component, however seem to be missing, if we are effectively using Haskell as an in-memory database where is the "SQL for Haskell": a generic query language for Haskell terms?
There are three basic functions that every web app has to provide, and all of them could be provided by a generic "Haskell SQL": -- query the application state -- transform (possibly monadically) the application state : the result of the query is the new state -- access control: what an user can see is what is returned by an internal access control query
The availabilty of such a language would be a major boost for Haskell-based web applications as every application could be accessed via the same API, the only difference being the underlying application-specific data model.
So my question is: what ready-made solutions are there in this space, if any?
And if there are none, how would you proceed to design/implement such a language?
The basic requirements, in decreasing order of importance, are:
-- Safe, it must be possible to guarantee that a query: --- cannot cause a system crash --- completes by a fixed time of time --- uses a 'reasonable' amount of space --- cannot perform any unsafe operation (IO, or any unallowed read/write of the application state)
-- Expressive (simple queries should be simple, complex queries should be possible)
-- Simple to implement
-- Efficient: --- Repeated queries should be executed efficiently time-wise (it is acceptable for queries to be executed inefficiently the first time) and all should be space-efficient, so it should not do unnecessary copying.
-- User friendly: --- Simple to use for non-haskeller --- Short queries
Ah, I almost forgot, it should also be able to make a good espresso.
The problem can be broken in two parts:
1) How to implement generic queries on nested terms in Haskell?
2) How to map the queries, written as a string, to the internal Haskell query
Regarding the first point, I am aware of with the following options: - SYB (Data.Generics..) - Oleg's Zipper - (Nested) list comprehensions (that are being extended with SQL-like order by and group by operators)
Being rather new to Haskell all these options are rather unfamiliar so I would appreciate any advice on what should be preferred and why.
Regarding the second point:
The simplest solution would be to avoid the problem entirely by using Haskell directly as the query language.
This is the LambdaBot way: queries are Haskell expression, compiled in a limited environment (a module with a fixed set of imports, no TH).
Lambdabot avoids problems by executing the expression on a separate process in a OS-enforced sandbox that can be as restrictive as required (especially using something like SELinux).
However, to get the query to execute efficiently it would probably have to be executed in a GHC thread and I am not sure how safe that would be.
Looking at the discussion at http://haskell.org/haskellwiki/Safely_running_untrusted_Haskell_code it seems clear that there are many open issues.
For example, how would I enforce limits on the space used by the query?
So, it would probably be better to define a separate query language that is less expressive but more controllable than full Haskell, but what form should that take?
Any suggestion/tip/reference is very welcome,
titto
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wednesday 27 June 2007 09:32:16 Alex Jacobson wrote:
Titto,
Have you looked at HAppS.DBMS.IxSet? Right now it provides a generic way to query indexed sets.
If you want to take a shot at making the queries serializable, I don't think it would be that difficult (but I have not tried so YMMV).
Hi Alex, thanks for remininding me about that. It is a very nice back-end and as you say, it should not be too hard to design a SQL-like query language on top of it. I am still wondering, however, what meta-model is more appropriate to represent the info hold in a Web app. Unfortunately there seem to be at least 3 different ones (without considering mixed approaches like F-Logic): 1) Graph This is really the native Haskell way of representing information: the model is defined using classes or data types and types are connected by direct uni-directional links. So for example the Sale/Item model (from the HAppS DBMS Examples) might be written like: data Item = Item {stock::Int,description::String,price::Cents} deriving (Ord,Eq,Read,Show) data Sale = Sale {date::CalendarTime ,soldItem::Item -- NOTE: uni-directional link to Item ,qty::Int ,salePrice::Cents} deriving (Ord,Eq,Read,Show) or in more abstract form using classes: class Item i where description :: i -> String price :: i -> Cents class Sale s where soldItem :: Item i => s -> i This is also very much the Web-way: information is a graph of resource linked via uni-directional links. Information is queried by path traversal (REST-style): Assuming that the root "/" represents the collection of all sales then: HTTP GET /elemAt[2345]/soldItem/description.json might return the JSON representation of the description of the item sold as part of sale 2345. 2) Relational Information is represented as tables, that can be joined up via keys, as implemented in HAppS DBMS or in any relational database. The model becomes: data Item = Item {itemId::Id -- NOTE: primary key ,stock::Int,description::String,price::Cents} deriving (Ord,Eq,Read,Show) data Sale = Sale {date::CalendarTime, soldItemId::Id -- NOTE: foreign key ,qty::Int,salePrice::Cents} deriving (Ord,Eq,Read,Show) Plus the appropriate indexes definitions. Information can be queried via a SQL-like language. 3) Logic This is the "Semantic Web" way: information is broken down into assertions, that in their simplest form are just triples: subject predicate object, the model then becomes something like: Item hasDescription String Item hasPrice Cents Sale hasItem Item It can be populated piecemeal with assertions like: item0 hasDescription "indesit cooker 34BA" item0 hasPrice 3.5 Sale0 hasSoldItem item0 It can be queried using a logic-oriented query language (e.g SPARQL): sale2345 hasItem ?item ?item hasDescription ?description Moving from Graph to Relational to Logic the meta-model becomes simpler and more flexible. The flip-side is that the model (and the queries) become more verbose. It is not clear where is the sweet spot. What people think? Best, titto

Titto, The usual tradeoff is between efficiency and queryability. It is really easy to optimize graph traversal. It is really hard to get performance out of the Logic model. The traditional sweet spot has been the relational model, but it breaks down at very large scale. A lot of very large scale web sites implement some form of relational database sharding which basically means partitioning the database and doing a bit of graph traversal to decide on the database and then relational within that database and then merging the results. -Alex- Pasqualino 'Titto' Assini wrote:
On Wednesday 27 June 2007 09:32:16 Alex Jacobson wrote:
Titto,
Have you looked at HAppS.DBMS.IxSet? Right now it provides a generic way to query indexed sets.
If you want to take a shot at making the queries serializable, I don't think it would be that difficult (but I have not tried so YMMV).
Hi Alex, thanks for remininding me about that. It is a very nice back-end and as you say, it should not be too hard to design a SQL-like query language on top of it.
I am still wondering, however, what meta-model is more appropriate to represent the info hold in a Web app.
Unfortunately there seem to be at least 3 different ones (without considering mixed approaches like F-Logic):
1) Graph
This is really the native Haskell way of representing information: the model is defined using classes or data types and types are connected by direct uni-directional links.
So for example the Sale/Item model (from the HAppS DBMS Examples) might be written like:
data Item = Item {stock::Int,description::String,price::Cents} deriving (Ord,Eq,Read,Show)
data Sale = Sale {date::CalendarTime ,soldItem::Item -- NOTE: uni-directional link to Item ,qty::Int ,salePrice::Cents} deriving (Ord,Eq,Read,Show)
or in more abstract form using classes:
class Item i where description :: i -> String price :: i -> Cents
class Sale s where soldItem :: Item i => s -> i
This is also very much the Web-way: information is a graph of resource linked via uni-directional links.
Information is queried by path traversal (REST-style):
Assuming that the root "/" represents the collection of all sales then:
HTTP GET /elemAt[2345]/soldItem/description.json
might return the JSON representation of the description of the item sold as part of sale 2345.
2) Relational
Information is represented as tables, that can be joined up via keys, as implemented in HAppS DBMS or in any relational database.
The model becomes:
data Item = Item {itemId::Id -- NOTE: primary key ,stock::Int,description::String,price::Cents} deriving (Ord,Eq,Read,Show)
data Sale = Sale {date::CalendarTime, soldItemId::Id -- NOTE: foreign key ,qty::Int,salePrice::Cents} deriving (Ord,Eq,Read,Show)
Plus the appropriate indexes definitions.
Information can be queried via a SQL-like language.
3) Logic
This is the "Semantic Web" way: information is broken down into assertions, that in their simplest form are just triples: subject predicate object, the model then becomes something like:
Item hasDescription String Item hasPrice Cents Sale hasItem Item
It can be populated piecemeal with assertions like:
item0 hasDescription "indesit cooker 34BA" item0 hasPrice 3.5 Sale0 hasSoldItem item0
It can be queried using a logic-oriented query language (e.g SPARQL): sale2345 hasItem ?item ?item hasDescription ?description
Moving from Graph to Relational to Logic the meta-model becomes simpler and more flexible. The flip-side is that the model (and the queries) become more verbose. It is not clear where is the sweet spot.
What people think?
Best,
titto
participants (3)
-
Alex Jacobson
-
Neil Mitchell
-
Pasqualino 'Titto' Assini