Join support in persistent

older
embedding hamlet/julius/cassius in...

Michael Snoyman

2 Apr 2011 2 Apr '11

5:50 p.m.

Hey all, After a long discussion with Aur Saraf, I think we came up with a good approach to join support in Persistent. Let's review the goals: * Allow for non-relational backends, such as Redis (simple key-value stores) * Allow SQL backends to take advantage of the database's JOIN abilities. * Not force SQL backends to use JOIN if they'd rather avoid it. * Keep a simple, straight-forward, type-safe API like we have everywhere else in Persistent. * Cover the most common (say, 95%) of use cases out-of-the-box. So our idea (well, if you don't like it, don't blame Aur...) is to provide a separate module (Database.Persist.Join) which provides special functions for the most common join operations. To start with, I want to handle a two-table one-to-many relationship. For demonstration purposes, let's consider a blog entry application, with entities Author and Entry. Each Entry has precisely one Author, and each Author can have many entries. In Persistent, it looks like: Author name String Asc isPublic Bool Eq Entry author AuthorId Eq title String published UTCTime Desc isPublic Bool Eq In order to get a list of all entries along with their authors, you can use the newly added[1] selectOneMany function: selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq This will return a value of type: type AuthorPair = (AuthorId, Author) type EntryPair = (EntryId, Entry) [(AuthorPair, [EntryPair])] In addition to Database.Persist.Join, there is also a parallel module named Database.Persist.Join.Sql, which has an alternative version of selectOneMany that is powered by a SQL JOIN. It has almost identical semantics: the only catch comes in when you don't fully specify ordering. But then again, if you don't specify ordering in the first place the order of the results is undefined, so it really *is* identical semantics, just slightly different behavior. Anyway, it's almost 1 in the morning, so I hope I haven't rambled too much. The basic idea is this: Persistent 0.5 will provide a nice, high-level approach to relations. I'll be adding more functions to these modules as necessary, and I'd appreciate input on what people would like to see there. Michael [1] https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f8...

Show replies by date

Greg Weber

2 Apr 2 Apr

11:40 p.m.

That is wonderful- application joins or database joins. Lets compare to Rails: selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq Author.where(:isPublic => true).order("name").includes(:entries) & Entry.where(:isPublic => true).order("published DESC") Note that a Rails query is lazy and the & is combining the queries. However, when there are no filtering criteria on the association, Rails prefers to perform 2 queries- one to retrieve the authors, and then one to retrieve the entries based on the author ids: SELECT "entries".* FROM "entries" WHERE ("entries".author_id IN (51,1,78,56,64,84,63,60)) Originally rails always did do a SQL level join, but then decided to switch to preferring app-level, largely because it reduced the number of Ruby objects that needed to be allocated, resulting in much better performance for large data sets [1]. It appears that persistent.Join is instead performing an n + 1 query- one query per each author. We should avoid these kinds of queries, and then there will not be much point to an outer join in the db. Looking at the behavior of Rails for joins, I don't like how it decides between types of joins. The sql produced by Rails is not in fact identical to the persistent one: it will do an outer join with the entry filtering in a WHERE clause, not as part of the JOIN conditions. If we are to support joins it needs to be very apparent which type of join is performed. selectOneMany doesn't have an offset and limit. If we added it we end up with queries like this: selectOneMany [] [] [] [] EntryAuthorEq 0 0 This function with 5+ required arguments is somewhat awkward/difficult to use and to read. Rails is composable in a readable way because it copied haskellDB. I would like to get away from the empty optional arguments. I am all for adding these changes for now, I just hope we can move to a more composable API in the future. I thought the API that Aur came up with was a better effort in that direction, although there are definitely practical issues with it. [1] http://akitaonrails.com/2008/05/25/rolling-with-rails-2-1-the-first-full-tut... Greg Weber On Sat, Apr 2, 2011 at 2:50 PM, Michael Snoyman wrote:

...

Hey all,

After a long discussion with Aur Saraf, I think we came up with a good approach to join support in Persistent. Let's review the goals:

* Allow for non-relational backends, such as Redis (simple key-value stores) * Allow SQL backends to take advantage of the database's JOIN abilities. * Not force SQL backends to use JOIN if they'd rather avoid it. * Keep a simple, straight-forward, type-safe API like we have everywhere else in Persistent. * Cover the most common (say, 95%) of use cases out-of-the-box.

So our idea (well, if you don't like it, don't blame Aur...) is to provide a separate module (Database.Persist.Join) which provides special functions for the most common join operations. To start with, I want to handle a two-table one-to-many relationship. For demonstration purposes, let's consider a blog entry application, with entities Author and Entry. Each Entry has precisely one Author, and each Author can have many entries. In Persistent, it looks like:

Author name String Asc isPublic Bool Eq Entry author AuthorId Eq title String published UTCTime Desc isPublic Bool Eq

In order to get a list of all entries along with their authors, you can use the newly added[1] selectOneMany function:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

This will return a value of type:

type AuthorPair = (AuthorId, Author) type EntryPair = (EntryId, Entry) [(AuthorPair, [EntryPair])]

In addition to Database.Persist.Join, there is also a parallel module named Database.Persist.Join.Sql, which has an alternative version of selectOneMany that is powered by a SQL JOIN. It has almost identical semantics: the only catch comes in when you don't fully specify ordering. But then again, if you don't specify ordering in the first place the order of the results is undefined, so it really *is* identical semantics, just slightly different behavior.

Anyway, it's almost 1 in the morning, so I hope I haven't rambled too much. The basic idea is this: Persistent 0.5 will provide a nice, high-level approach to relations. I'll be adding more functions to these modules as necessary, and I'd appreciate input on what people would like to see there.

Michael

[1] https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f8...

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Michael Snoyman

3 Apr 3 Apr

12:19 a.m.

On Sun, Apr 3, 2011 at 6:40 AM, Greg Weber wrote:

...

That is wonderful- application joins or database joins. Lets compare to Rails:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

Author.where(:isPublic => true).order("name").includes(:entries) & Entry.where(:isPublic => true).order("published DESC")

How does Rails handle when there are two join keys? For example: Person name String Entry author PersonId editor PersonId title String

...

Note that a Rails query is lazy and the & is combining the queries. However, when there are no filtering criteria on the association, Rails prefers to perform 2 queries- one to retrieve the authors, and then one to retrieve the entries based on the author ids:

SELECT "entries".* FROM "entries" WHERE ("entries".author_id IN (51,1,78,56,64,84,63,60))

Originally rails always did do a SQL level join, but then decided to switch to preferring app-level, largely because it reduced the number of Ruby objects that needed to be allocated, resulting in much better performance for large data sets [1].

Regarding lazy: we could definitely go that route of allowing laziness via unsafeInterleaveIO, but it's very contrary to the normal workings of Yesod/Persistent. If we want constant space, we should use enumerators I think. As far as the performance concern: do you think we need to be worried about the large number of allocated objects? My guess is that Haskell won't have the same concerns here, but I could be wrong.

...

It appears that persistent.Join is instead performing an n + 1 query- one query per each author. We should avoid these kinds of queries, and then there will not be much point to an outer join in the db.

Good point, we can easily fix that. However, can you clarify what you mean by outer join here? Why would there be need for an outer join? * Correction: It's not so simple to add, at least not efficiently. We would need to have a function like this: selectOneMany :: (PersistEntity one, PersistEntity many, PersistBackend m, Eq (Key one)) => [Filter one] -> [Order one] -> [Filter many] -> [Order many] -> ([Key one] -> Filter many) -> (many -> Key one) -> m [((Key one, one), [(Key many, many)])] selectOneMany oneF oneO manyF manyO inFilt' getKey = do x <- selectList oneF oneO 0 0 let inFilt = inFilt' $ map fst x y <- selectList (inFilt : manyF) manyO 0 0 return $ map (go y) x where go manys one@(key, _) = (one, filter (\x -> getKey (snd x) == key) manys) I'm not convinced that there will be any performance enhancement here. Most likely, when dealing with small datasets, this will pay off due to the savings in the number of bytes transmitted with the server. But for large datasets, the O(m * n) complexity (number of ones times number of manys) will hurt us. I'd prefer to optimize for larger datasets. Plus, I think the current API is much nicer. If you can think of a better approach than this, let me know. But remember that there's no way to know the sort order of the keys of the one table.

...

Looking at the behavior of Rails for joins, I don't like how it decides between types of joins. The sql produced by Rails is not in fact identical to the persistent one: it will do an outer join with the entry filtering in a WHERE clause, not as part of the JOIN conditions. If we are to support joins it needs to be very apparent which type of join is performed.

As far as I know, every database on the planet these days treat these two as identical for performance reasons: SELECT * from a, b where a.foo = b.bar SELECT * from a INNER JOIN b ON a.foo = b .bar I went the INNER JOIN route because I've always had a preference for it. But *outer* join will be a very different beast. If you look in the runtests.hs file, it specifically relies on the fact that we're doing an inner join. An outer join would mean that *all* authors appear in the output set, while an inner join will only include authors with entries. I agree that we should make this clear in the docs.

...

selectOneMany doesn't have an offset and limit. If we added it we end up with queries like this:

selectOneMany [] [] [] [] EntryAuthorEq 0 0

I purposely avoided offset and limit for now, since I'm not exactly certain how it should be applied. Should it offset/limit the number of ones? The number of manys? The total number of manys for all ones, or the number of manys per one?

...

This function with 5+ required arguments is somewhat awkward/difficult to use and to read. Rails is composable in a readable way because it copied haskellDB. I would like to get away from the empty optional arguments.

OK, how about this: data SelectOneManyArgs one many = SelectOneManyArgs { oneFilter :: [Filter] } ... defaultOneManyArgs :: (Key one -> Filter many) -> SelectOneManyArgs one many But I'd like to have shorter names somehow.

...

I am all for adding these changes for now, I just hope we can move to a more composable API in the future. I thought the API that Aur came up with was a better effort in that direction, although there are definitely practical issues with it.

I definitely think we should continue exploring the design space to see if we can come up with better solutions. But I have a sneaking suspicion that if we want to have fully customizable queries that allow arbitrary joining and selecting individual fields, we're going to end up with some kind of SQL syntax inside of Template Haskell. I'm all for making something like that... but it's not Persistent. I think Persistent's goal should *not* be to handle every possible query you can ever imagine. It should handle the common cases efficiently, with type-safety and a simple API. It should also allow people to easily drop down to something more low-level- possibly sacrificing type safety- when the need arises. And over time, as we get more user experience feedback, we can push the boundary farther of what Persistent handles out-of-the-box. If we get to the point where 95% of queries people perform can be handled with an out-of-the-box function, and for the 5% people need to write some SQL (or MongoDB backend code, or Redis...), I think we'll have hit our target. Michael

...

[1] http://akitaonrails.com/2008/05/25/rolling-with-rails-2-1-the-first-full-tut...

Greg Weber

On Sat, Apr 2, 2011 at 2:50 PM, Michael Snoyman wrote:

...
Hey all,

After a long discussion with Aur Saraf, I think we came up with a good approach to join support in Persistent. Let's review the goals:

* Allow for non-relational backends, such as Redis (simple key-value stores) * Allow SQL backends to take advantage of the database's JOIN abilities. * Not force SQL backends to use JOIN if they'd rather avoid it. * Keep a simple, straight-forward, type-safe API like we have everywhere else in Persistent. * Cover the most common (say, 95%) of use cases out-of-the-box.

So our idea (well, if you don't like it, don't blame Aur...) is to provide a separate module (Database.Persist.Join) which provides special functions for the most common join operations. To start with, I want to handle a two-table one-to-many relationship. For demonstration purposes, let's consider a blog entry application, with entities Author and Entry. Each Entry has precisely one Author, and each Author can have many entries. In Persistent, it looks like:

Author name String Asc isPublic Bool Eq Entry author AuthorId Eq title String published UTCTime Desc isPublic Bool Eq

In order to get a list of all entries along with their authors, you can use the newly added[1] selectOneMany function:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

This will return a value of type:

type AuthorPair = (AuthorId, Author) type EntryPair = (EntryId, Entry) [(AuthorPair, [EntryPair])]

In addition to Database.Persist.Join, there is also a parallel module named Database.Persist.Join.Sql, which has an alternative version of selectOneMany that is powered by a SQL JOIN. It has almost identical semantics: the only catch comes in when you don't fully specify ordering. But then again, if you don't specify ordering in the first place the order of the results is undefined, so it really *is* identical semantics, just slightly different behavior.

Anyway, it's almost 1 in the morning, so I hope I haven't rambled too much. The basic idea is this: Persistent 0.5 will provide a nice, high-level approach to relations. I'll be adding more functions to these modules as necessary, and I'd appreciate input on what people would like to see there.

Michael

[1] https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f8...

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Greg Weber

1:45 a.m.

On Sat, Apr 2, 2011 at 9:19 PM, Michael Snoyman wrote:

...

On Sun, Apr 3, 2011 at 6:40 AM, Greg Weber wrote:

...
That is wonderful- application joins or database joins. Lets compare to Rails:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

Author.where(:isPublic => true).order("name").includes(:entries) & Entry.where(:isPublic => true).order("published DESC")

How does Rails handle when there are two join keys? For example:

Person name String Entry author PersonId editor PersonId title String

Basically the same- you have to declare associations. It auto-detects the class and foreign key from the association name in simple case- I left the code out before: class Entry < ActiveRecord::Base; end class Author < ActiveRecord::Base has_many :entries end

...

...
Note that a Rails query is lazy and the & is combining the queries. However, when there are no filtering criteria on the association, Rails prefers to perform 2 queries- one to retrieve the authors, and then one to retrieve the entries based on the author ids:

SELECT "entries".* FROM "entries" WHERE ("entries".author_id IN (51,1,78,56,64,84,63,60))

Originally rails always did do a SQL level join, but then decided to switch to preferring app-level, largely because it reduced the number of Ruby objects that needed to be allocated, resulting in much better performance for large data sets [1].

Regarding lazy: we could definitely go that route of allowing laziness via unsafeInterleaveIO, but it's very contrary to the normal workings of Yesod/Persistent. If we want constant space, we should use enumerators I think.

As far as the performance concern: do you think we need to be worried about the large number of allocated objects? My guess is that Haskell won't have the same concerns here, but I could be wrong.

I think these problems were specific to the way the results were being constructed in Rails. The real point is that Rails was able to default to smart application joins with no downside.

...

...
It appears that persistent.Join is instead performing an n + 1 query- one query per each author. We should avoid these kinds of queries, and then there will not be much point to an outer join in the db.

Good point, we can easily fix that. However, can you clarify what you mean by outer join here? Why would there be need for an outer join?

* Correction: It's not so simple to add, at least not efficiently. We would need to have a function like this:

selectOneMany :: (PersistEntity one, PersistEntity many, PersistBackend m, Eq (Key one)) => [Filter one] -> [Order one] -> [Filter many] -> [Order many] -> ([Key one] -> Filter many) -> (many -> Key one) -> m [((Key one, one), [(Key many, many)])] selectOneMany oneF oneO manyF manyO inFilt' getKey = do x <- selectList oneF oneO 0 0 let inFilt = inFilt' $ map fst x y <- selectList (inFilt : manyF) manyO 0 0 return $ map (go y) x where go manys one@(key, _) = (one, filter (\x -> getKey (snd x) == key) manys)

I'm not convinced that there will be any performance enhancement here. Most likely, when dealing with small datasets, this will pay off due to the savings in the number of bytes transmitted with the server. But for large datasets, the O(m * n) complexity (number of ones times number of manys) will hurt us. I'd prefer to optimize for larger datasets. Plus, I think the current API is much nicer.

If you can think of a better approach than this, let me know. But remember that there's no way to know the sort order of the keys of the one table.

So this is an overly simple solution. The first selectList should probably be a function that returns a list of ids to reduce 2m from the map to m (unless that can be fused away). But more importantly it should be returning a Set of keys to make for one lookup for each n in the second query for a total of O(n) + O(m). The ideal here might be a set that has immediate access to the list of keys. An intriguing idea would be for SelectList to return an ordered Map that can still be treated as a list.

...

...
Looking at the behavior of Rails for joins, I don't like how it decides between types of joins. The sql produced by Rails is not in fact identical to the persistent one: it will do an outer join with the entry filtering in a WHERE clause, not as part of the JOIN conditions. If we are to support joins it needs to be very apparent which type of join is performed.

As far as I know, every database on the planet these days treat these two as identical for performance reasons:

SELECT * from a, b where a.foo = b.bar SELECT * from a INNER JOIN b ON a.foo = b .bar

I went the INNER JOIN route because I've always had a preference for it. But *outer* join will be a very different beast. If you look in the runtests.hs file, it specifically relies on the fact that we're doing an inner join. An outer join would mean that *all* authors appear in the output set, while an inner join will only include authors with entries.

I agree that we should make this clear in the docs.

It would be best for it to also be clear from the function names or arguments..

...

...
selectOneMany doesn't have an offset and limit. If we added it we end up with queries like this:

selectOneMany [] [] [] [] EntryAuthorEq 0 0

I purposely avoided offset and limit for now, since I'm not exactly certain how it should be applied. Should it offset/limit the number of ones? The number of manys? The total number of manys for all ones, or the number of manys per one?

The number of ones. Breaking up the manys is difficult for the framework and the user.

...

...
This function with 5+ required arguments is somewhat awkward/difficult to use and to read. Rails is composable in a readable way because it copied haskellDB. I would like to get away from the empty optional arguments.

OK, how about this:

data SelectOneManyArgs one many = SelectOneManyArgs { oneFilter :: [Filter] } ...

defaultOneManyArgs :: (Key one -> Filter many) -> SelectOneManyArgs one many

But I'd like to have shorter names somehow.

...
I am all for adding these changes for now, I just hope we can move to a more composable API in the future. I thought the API that Aur came up with was a better effort in that direction, although there are definitely practical issues with it.

I definitely think we should continue exploring the design space to see if we can come up with better solutions. But I have a sneaking suspicion that if we want to have fully customizable queries that allow arbitrary joining and selecting individual fields, we're going to end up with some kind of SQL syntax inside of Template Haskell. I'm all for making something like that... but it's not Persistent.

I think Persistent's goal should *not* be to handle every possible query you can ever imagine. It should handle the common cases efficiently, with type-safety and a simple API. It should also allow people to easily drop down to something more low-level- possibly sacrificing type safety- when the need arises. And over time, as we get more user experience feedback, we can push the boundary farther of what Persistent handles out-of-the-box.

If we get to the point where 95% of queries people perform can be handled with an out-of-the-box function, and for the 5% people need to write some SQL (or MongoDB backend code, or Redis...), I think we'll have hit our target.

I agree, I am not trying to say that we need to elegantly handle every possible query. I am just pushing that for those that we are currently handling to be elegant. Persistent integration with directly writing SQL should probably be a high priority. Greg

...

Michael

...
[1] http://akitaonrails.com/2008/05/25/rolling-with-rails-2-1-the-first-full-tut...

Greg Weber

On Sat, Apr 2, 2011 at 2:50 PM, Michael Snoyman wrote:

...
Hey all,

After a long discussion with Aur Saraf, I think we came up with a good approach to join support in Persistent. Let's review the goals:

* Allow for non-relational backends, such as Redis (simple key-value stores) * Allow SQL backends to take advantage of the database's JOIN abilities. * Not force SQL backends to use JOIN if they'd rather avoid it. * Keep a simple, straight-forward, type-safe API like we have everywhere else in Persistent. * Cover the most common (say, 95%) of use cases out-of-the-box.

So our idea (well, if you don't like it, don't blame Aur...) is to provide a separate module (Database.Persist.Join) which provides special functions for the most common join operations. To start with, I want to handle a two-table one-to-many relationship. For demonstration purposes, let's consider a blog entry application, with entities Author and Entry. Each Entry has precisely one Author, and each Author can have many entries. In Persistent, it looks like:

Author name String Asc isPublic Bool Eq Entry author AuthorId Eq title String published UTCTime Desc isPublic Bool Eq

In order to get a list of all entries along with their authors, you can use the newly added[1] selectOneMany function:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

This will return a value of type:

type AuthorPair = (AuthorId, Author) type EntryPair = (EntryId, Entry) [(AuthorPair, [EntryPair])]

In addition to Database.Persist.Join, there is also a parallel module named Database.Persist.Join.Sql, which has an alternative version of selectOneMany that is powered by a SQL JOIN. It has almost identical semantics: the only catch comes in when you don't fully specify ordering. But then again, if you don't specify ordering in the first place the order of the results is undefined, so it really *is* identical semantics, just slightly different behavior.

Anyway, it's almost 1 in the morning, so I hope I haven't rambled too much. The basic idea is this: Persistent 0.5 will provide a nice, high-level approach to relations. I'll be adding more functions to these modules as necessary, and I'd appreciate input on what people would like to see there.

Michael

[1] https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f8...

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Michael Snoyman

2:10 a.m.

On Sun, Apr 3, 2011 at 8:45 AM, Greg Weber wrote: [snip]

...

...
...
It appears that persistent.Join is instead performing an n + 1 query- one query per each author. We should avoid these kinds of queries, and then there will not be much point to an outer join in the db.

Good point, we can easily fix that. However, can you clarify what you mean by outer join here? Why would there be need for an outer join?

* Correction: It's not so simple to add, at least not efficiently. We would need to have a function like this:

selectOneMany :: (PersistEntity one, PersistEntity many, PersistBackend m, Eq (Key one)) => [Filter one] -> [Order one] -> [Filter many] -> [Order many] -> ([Key one] -> Filter many) -> (many -> Key one) -> m [((Key one, one), [(Key many, many)])] selectOneMany oneF oneO manyF manyO inFilt' getKey = do x <- selectList oneF oneO 0 0 let inFilt = inFilt' $ map fst x y <- selectList (inFilt : manyF) manyO 0 0 return $ map (go y) x where go manys one@(key, _) = (one, filter (\x -> getKey (snd x) == key) manys)

I'm not convinced that there will be any performance enhancement here. Most likely, when dealing with small datasets, this will pay off due to the savings in the number of bytes transmitted with the server. But for large datasets, the O(m * n) complexity (number of ones times number of manys) will hurt us. I'd prefer to optimize for larger datasets. Plus, I think the current API is much nicer.

If you can think of a better approach than this, let me know. But remember that there's no way to know the sort order of the keys of the one table.

So this is an overly simple solution. The first selectList should probably be a function that returns a list of ids to reduce 2m from the map to m (unless that can be fused away). But more importantly it should be returning a Set of keys to make for one lookup for each n in the second query for a total of O(n) + O(m). The ideal here might be a set that has immediate access to the list of keys. An intriguing idea would be for SelectList to return an ordered Map that can still be treated as a list.

...
I think you're solving a different problem. Are you talking about the fact

that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work. I'm not sure what you're suggesting here to be honest, can you clarify?

...

...
...
Looking at the behavior of Rails for joins, I don't like how it decides between types of joins. The sql produced by Rails is not in fact identical to the persistent one: it will do an outer join with the entry filtering in a WHERE clause, not as part of the JOIN conditions. If we are to support joins it needs to be very apparent which type of join is performed.

As far as I know, every database on the planet these days treat these two as identical for performance reasons:

SELECT * from a, b where a.foo = b.bar SELECT * from a INNER JOIN b ON a.foo = b .bar

I went the INNER JOIN route because I've always had a preference for it. But *outer* join will be a very different beast. If you look in the runtests.hs file, it specifically relies on the fact that we're doing an inner join. An outer join would mean that *all* authors appear in the output set, while an inner join will only include authors with entries.

I agree that we should make this clear in the docs.

It would be best for it to also be clear from the function names or arguments..

I actually thought it *was* clear that it would be an inner join and not an

outer join. But how would you change the names? I don't want to end up with selectJoiningOneToManyRelationshipUsingInnerJoin ;)

...

...
...
selectOneMany doesn't have an offset and limit. If we added it we end up with queries like this:

selectOneMany [] [] [] [] EntryAuthorEq 0 0

I purposely avoided offset and limit for now, since I'm not exactly certain how it should be applied. Should it offset/limit the number of ones? The number of manys? The total number of manys for all ones, or the number of manys per one?

The number of ones. Breaking up the manys is difficult for the framework and the user.

Fair enough.

...

...
...
This function with 5+ required arguments is somewhat awkward/difficult to use and to read. Rails is composable in a readable way because it copied haskellDB. I would like to get away from the empty optional arguments.

OK, how about this:

data SelectOneManyArgs one many = SelectOneManyArgs { oneFilter :: [Filter] } ...

defaultOneManyArgs :: (Key one -> Filter many) -> SelectOneManyArgs one many

But I'd like to have shorter names somehow.

...
I am all for adding these changes for now, I just hope we can move to a more composable API in the future. I thought the API that Aur came up with was a better effort in that direction, although there are definitely practical issues with it.

I definitely think we should continue exploring the design space to see if we can come up with better solutions. But I have a sneaking suspicion that if we want to have fully customizable queries that allow arbitrary joining and selecting individual fields, we're going to end up with some kind of SQL syntax inside of Template Haskell. I'm all for making something like that... but it's not Persistent.

I think Persistent's goal should *not* be to handle every possible query you can ever imagine. It should handle the common cases efficiently, with type-safety and a simple API. It should also allow people to easily drop down to something more low-level- possibly sacrificing type safety- when the need arises. And over time, as we get more user experience feedback, we can push the boundary farther of what Persistent handles out-of-the-box.

If we get to the point where 95% of queries people perform can be handled with an out-of-the-box function, and for the 5% people need to write some SQL (or MongoDB backend code, or Redis...), I think we'll have hit our target.

I agree, I am not trying to say that we need to elegantly handle every possible query. I am just pushing that for those that we are currently handling to be elegant. Persistent integration with directly writing SQL should probably be a high priority.

Can I get some feedback on what's missing for this? In the Database.Persist.GenericSql.Raw module[1], there are two functions (withStmt and execute) that let you run any SQL command against the DB you want. I've used this myself when I needed to do something that Persistent didn't allow (a full text search in my case). I know that the functions are neither pretty nor well documented, but what's missing that is preventing people from dropping down to SQL now? If it's just a documentation issue, I'll address it. Michael [1] http://hackage.haskell.org/packages/archive/persistent/0.4.2/doc/html/Databa...

...

Greg

...
Michael

...
[1] http://akitaonrails.com/2008/05/25/rolling-with-rails-2-1-the-first-full-tut...

Greg Weber

On Sat, Apr 2, 2011 at 2:50 PM, Michael Snoyman wrote:

...
Hey all,

After a long discussion with Aur Saraf, I think we came up with a good approach to join support in Persistent. Let's review the goals:

* Allow for non-relational backends, such as Redis (simple key-value stores) * Allow SQL backends to take advantage of the database's JOIN abilities. * Not force SQL backends to use JOIN if they'd rather avoid it. * Keep a simple, straight-forward, type-safe API like we have everywhere else in Persistent. * Cover the most common (say, 95%) of use cases out-of-the-box.

So our idea (well, if you don't like it, don't blame Aur...) is to provide a separate module (Database.Persist.Join) which provides special functions for the most common join operations. To start with, I want to handle a two-table one-to-many relationship. For demonstration purposes, let's consider a blog entry application, with entities Author and Entry. Each Entry has precisely one Author, and each Author can have many entries. In Persistent, it looks like:

Author name String Asc isPublic Bool Eq Entry author AuthorId Eq title String published UTCTime Desc isPublic Bool Eq

In order to get a list of all entries along with their authors, you can use the newly added[1] selectOneMany function:

selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq

This will return a value of type:

type AuthorPair = (AuthorId, Author) type EntryPair = (EntryId, Entry) [(AuthorPair, [EntryPair])]

In addition to Database.Persist.Join, there is also a parallel module named Database.Persist.Join.Sql, which has an alternative version of selectOneMany that is powered by a SQL JOIN. It has almost identical semantics: the only catch comes in when you don't fully specify ordering. But then again, if you don't specify ordering in the first place the order of the results is undefined, so it really *is* identical semantics, just slightly different behavior.

Anyway, it's almost 1 in the morning, so I hope I haven't rambled too much. The basic idea is this: Persistent 0.5 will provide a nice, high-level approach to relations. I'll be adding more functions to these modules as necessary, and I'd appreciate input on what people would like to see there.

Michael

[1] https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f8...

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Greg Weber

2:40 a.m.

[snip] I think you're solving a different problem. Are you talking about the fact

...

that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work.

I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list. [snip]

...

...
It would be best for it to also be clear from the function names or arguments..

I actually thought it *was* clear that it would be an inner join and not an outer join. But how would you change the names? I don't want to end up with selectJoiningOneToManyRelationshipUsingInnerJoin ;)

How do I know it is an inner join? If there is just one function I would assume it is an outer and I can get an inner by filtering the results, not that there is no way to do an outer. If there is a second function that is called *Outer, then it might be clear that the other is an inner. The other approach is instead of EntryAuthorEq to have something like InnerJoin (if possible) or otherwise EntryAuthorInner. [snip]

...

...
...
I agree, I am not trying to say that we need to elegantly handle every possible query. I am just pushing that for those that we are currently handling to be elegant. Persistent integration with directly writing SQL should probably be a high priority.

Can I get some feedback on what's missing for this? In the Database.Persist.GenericSql.Raw module[1], there are two functions (withStmt and execute) that let you run any SQL command against the DB you want. I've used this myself when I needed to do something that Persistent didn't allow (a full text search in my case).

I know that the functions are neither pretty nor well documented, but what's missing that is preventing people from dropping down to SQL now? If it's just a documentation issue, I'll address it.

Michael

Yes we need docs. Is it possible after a raw query to build the data structures back up? - achieve the same return result of selectList? The other aspect of integration would be the ability to mix SQL fragments with a normal persistent query. For example, for the sql join function under discussion the user could supply the raw join SQL to make it an outer join.

Michael Snoyman

2:50 a.m.

...

[snip]

I think you're solving a different problem. Are you talking about the

...
fact that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work.

I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list.

But where will you use the Set? The slowdown is because we end up with two

On Sun, Apr 3, 2011 at 9:40 AM, Greg Weber wrote: lists: [(Key one, one)] and [(Key many, many)]. For each (Key one, one), we need to filter the entire [(Key many, many)] to find the records which match each one. As far as the code is concerned, doing the initial filter to the relevant (Key many) values is constant*. * Well, it's "constant" in the sense that the database backend handles it. In the case of SQL, it's converted to a "Entry"."author" IN [1,2,3]. How the database handles that code is outside the scope of this optimization.

...

[snip]

...
...
It would be best for it to also be clear from the function names or arguments..

I actually thought it *was* clear that it would be an inner join and not an outer join. But how would you change the names? I don't want to end up with selectJoiningOneToManyRelationshipUsingInnerJoin ;)

How do I know it is an inner join? If there is just one function I would assume it is an outer and I can get an inner by filtering the results, not that there is no way to do an outer. If there is a second function that is called *Outer, then it might be clear that the other is an inner. The other approach is instead of EntryAuthorEq to have something like InnerJoin (if possible) or otherwise EntryAuthorInner.

[snip]

...
...
...
I agree, I am not trying to say that we need to elegantly handle every possible query. I am just pushing that for those that we are currently handling to be elegant. Persistent integration with directly writing SQL should probably be a high priority.

Can I get some feedback on what's missing for this? In the Database.Persist.GenericSql.Raw module[1], there are two functions (withStmt and execute) that let you run any SQL command against the DB you want. I've used this myself when I needed to do something that Persistent didn't allow (a full text search in my case).

I know that the functions are neither pretty nor well documented, but what's missing that is preventing people from dropping down to SQL now? If it's just a documentation issue, I'll address it.

Michael

Yes we need docs. Is it possible after a raw query to build the data structures back up? - achieve the same return result of selectList? The other aspect of integration would be the ability to mix SQL fragments with a normal persistent query. For example, for the sql join function under discussion the user could supply the raw join SQL to make it an outer join.

I think by exposing (and documenting) some of the functions that Persistent uses internally, we can get pretty far here. For example, for marshaling, the most useful function is likely fromPersistValues[1], which anyone can use right now. However, the user needs to be careful to pass in the right set of fields. Michael [1] http://hackage.haskell.org/packages/archive/persistent/0.4.2/doc/html/Databa...

Greg Weber

3:08 a.m.

On Sat, Apr 2, 2011 at 11:50 PM, Michael Snoyman wrote:

...

On Sun, Apr 3, 2011 at 9:40 AM, Greg Weber wrote:

...
[snip]

I think you're solving a different problem. Are you talking about the

...
fact that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work.

I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list.

But where will you use the Set? The slowdown is because we end up with two lists: [(Key one, one)] and [(Key many, many)]. For each (Key one, one), we need to filter the entire [(Key many, many)] to find the records which match each one. As far as the code is concerned, doing the initial filter to the relevant (Key many) values is constant*.

We end up with 2 lists because we are building 2 lists. We should build data structures that allow for an efficient implementation of this function. Greg [snip]

Michael Snoyman

2:42 p.m.

On Sun, Apr 3, 2011 at 10:08 AM, Greg Weber wrote:

...

On Sat, Apr 2, 2011 at 11:50 PM, Michael Snoyman wrote:

...
On Sun, Apr 3, 2011 at 9:40 AM, Greg Weber wrote:

...
[snip]

...
I think you're solving a different problem. Are you talking about the fact that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work. I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list.

But where will you use the Set? The slowdown is because we end up with two lists: [(Key one, one)] and [(Key many, many)]. For each (Key one, one), we need to filter the entire [(Key many, many)] to find the records which match each one. As far as the code is concerned, doing the initial filter to the relevant (Key many) values is constant*.

We end up with 2 lists because we are building 2 lists. We should build data structures that allow for an efficient implementation of this function. Greg

[snip]

Ahh... I see what you mean. I just pushed a commit that uses a Map to hopefully improve efficiency. I think we end up with some kind of crazy formula like O(m + m * log n)... which is of course just O(m * log n), but whatever it is, definitely seems better than O(m * n). Thanks for pointing it out. I also implemented the "outer join" feature. I think the next step is to figure out a standard way to clean up the interface so we don't have 15 parameters floating around. This seems like a general question in Haskell: how do you encode named parameters + default values. We can do it with a different data type for each function, or even a single type class + multiple data types, but I'm wondering if someone out there can point to some prior art that does a good job. Michael

Greg Weber

3:23 p.m.

the mongoDB driver does this well. I think the other approach would be combinator or monad based- haskellDB being one example. Here is a mongoDB example with some modifications for readability runDB (select ["league" =: "National"] "team") runDB (select ["home.state" =: "NY"] "team") {sort = ["name"]} So select creates a record with default fields that determines the query. "sort" modifies the record. http://hackage.haskell.org/packages/archive/mongoDB/0.9.5/doc/html/Database-... http://hackage.haskell.org/packages/archive/mongoDB/0.9.5/doc/html/Database-... On Sun, Apr 3, 2011 at 11:42 AM, Michael Snoyman wrote:

...

On Sun, Apr 3, 2011 at 10:08 AM, Greg Weber wrote:

...
On Sat, Apr 2, 2011 at 11:50 PM, Michael Snoyman wrote:

...
On Sun, Apr 3, 2011 at 9:40 AM, Greg Weber wrote:

...
[snip]

...
I think you're solving a different problem. Are you talking about the fact that the EntryAuthorIn constructor takes a list instead of a Set? That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In

...
...
...
...
simply converts things to SQL and lets the database do the work. I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list.

But where will you use the Set? The slowdown is because we end up with two lists: [(Key one, one)] and [(Key many, many)]. For each (Key one, one), we need to filter the entire [(Key many, many)] to find the records which match each one. As far as the code is concerned, doing the initial filter to

constructor the

...
...
relevant (Key many) values is constant*.

We end up with 2 lists because we are building 2 lists. We should build data structures that allow for an efficient implementation of this function. Greg

[snip]

Ahh... I see what you mean. I just pushed a commit that uses a Map to hopefully improve efficiency. I think we end up with some kind of crazy formula like O(m + m * log n)... which is of course just O(m * log n), but whatever it is, definitely seems better than O(m * n). Thanks for pointing it out.

I also implemented the "outer join" feature. I think the next step is to figure out a standard way to clean up the interface so we don't have 15 parameters floating around. This seems like a general question in Haskell: how do you encode named parameters + default values. We can do it with a different data type for each function, or even a single type class + multiple data types, but I'm wondering if someone out there can point to some prior art that does a good job.

Michael

Michael Snoyman

4 Apr 4 Apr

1:38 a.m.

OK, I like this approach. Here's a nice little benefit: we can provide a separate typeclass (i.e., separate runDB function in your example) that will do the SQL-powered version. That way, in order to switch between application-joins and SQL-joins, you just need to switch the import of a single function. Michael On Sun, Apr 3, 2011 at 10:23 PM, Greg Weber wrote:

...

the mongoDB driver does this well. I think the other approach would be combinator or monad based- haskellDB being one example.

Here is a mongoDB example with some modifications for readability

runDB (select ["league" =: "National"] "team") runDB (select ["home.state" =: "NY"] "team") {sort = ["name"]}

So select creates a record with default fields that determines the query. "sort" modifies the record.

http://hackage.haskell.org/packages/archive/mongoDB/0.9.5/doc/html/Database-...

http://hackage.haskell.org/packages/archive/mongoDB/0.9.5/doc/html/Database-...

On Sun, Apr 3, 2011 at 11:42 AM, Michael Snoyman wrote:

...
On Sun, Apr 3, 2011 at 10:08 AM, Greg Weber wrote:

...
On Sat, Apr 2, 2011 at 11:50 PM, Michael Snoyman wrote:

...
On Sun, Apr 3, 2011 at 9:40 AM, Greg Weber

...
...
...
[snip]

...
I think you're solving a different problem. Are you talking about the fact that the EntryAuthorIn constructor takes a list instead of a

Set?

...
...
That's not where the slowdown comes from. Actually, for the current backends, a set would needlessly slow things down, since the In constructor simply converts things to SQL and lets the database do the work. I'm not sure what you're suggesting here to be honest, can you clarify?

An O(m + n) implementation instead of O(m * n) by using constant lookups instead of repeatedly searching through a list.

But where will you use the Set? The slowdown is because we end up with two lists: [(Key one, one)] and [(Key many, many)]. For each (Key one, one), we need to filter the entire [(Key many, many)] to find the records which match each one. As far as the code is concerned, doing the initial filter to

wrote: the

...
...
relevant (Key many) values is constant*.

We end up with 2 lists because we are building 2 lists. We should build data structures that allow for an efficient implementation of this function. Greg

[snip]

Ahh... I see what you mean. I just pushed a commit that uses a Map to hopefully improve efficiency. I think we end up with some kind of crazy formula like O(m + m * log n)... which is of course just O(m * log n), but whatever it is, definitely seems better than O(m * n). Thanks for pointing it out.

I also implemented the "outer join" feature. I think the next step is to figure out a standard way to clean up the interface so we don't have 15 parameters floating around. This seems like a general question in Haskell: how do you encode named parameters + default values. We can do it with a different data type for each function, or even a single type class + multiple data types, but I'm wondering if someone out there can point to some prior art that does a good job.

Michael

5277

Age (days ago)

5279

Last active (days ago)

List overview

Download

10 comments

2 participants

participants (2)

Greg Weber
Michael Snoyman

Join support in persistent

tags

participants (2)